Computer Vision

Utility Functions

mil_vision_tools.auto_canny(image: ndarray, sigma: float = 0.33) ndarray[source]

Returns a binary image of the edges in an image. Uses the median of the image to pick good low and upper threshold values for Canny, which makes it both adaptive and require less tuning.

Parameters
  • image (np.ndarray) – Grayscale image to find edges in.

  • sigma (float) – Number ranging from zero to one, rating how aggressively to find edges in an image, where 1 will produce the most edges but also the most noise.

Returns

An image with the canny algorithm applied.

Return type

np.ndarray

mil_vision_tools.contour_centroid(contour: List[float], M: Optional[Any] = None) Tuple[int, int][source]

Returns the centroid of the contour. If you have already calculated the moments for this contour, pass it in as the second parameter so it not recalculated.

Parameters
  • contour (np.ndarray) – The array of (x, y) representing a single contour.

  • M (Optional[Any]) –

    ???

Returns

The centroid of the contour.

Return type

Tuple[int, int]

mil_vision_tools.contour_mask(contour: ndarray, img_shape: Optional[List[int]] = None, mask: Optional[ndarray] = None)[source]

Returns an image with the mask of a filled in contour given a image shape.

One of img_shape or mask MUST not be None. If img_shape is set, a new mask image is created with that size. If mask is set, zero it out then draw mask.

Parameters
  • contour (np.ndarray) – An array representing the (x, y) position of a contour point.

  • img_shape (Optional[List[int]]) – An array representing the shape of the image. The first value represents the number of rows, while the second value represents the number of columns.

  • () (mask) –

mil_vision_tools.putText_ul(img: ndarray, text: str, org: ndarray, fontFace: int = 5, fontScale: int = 1, color: int = 255, thickness: int = 2, lineType: int = 8, bottomLeftOrigin: bool = False) None[source]

Puts text on image like cv2.putText but shifts it such that the origin is the upper left corner of where the text is placed, instead of the bottom left as OpenCV does by default.

Parameters
  • img (np.ndarray) – The source image to add text to.

  • text (str) – The text to add to the image.

  • org (np.ndarray) – An array representing the origin of the image as [x, y].

  • fontFace (int) – The font to use. Defaults to cv2.FONT_HERSHEY_COMPLEX_SMALL.

  • fontScale (int) – The scale of the font. Defaults to 1.

  • color (int) – The color of the font. Defaults to 255.

  • thickness (int) – The thickness of the text. Defaults to 2.

  • lineType (int) – The type of line to draw in the image. Defaults to 8.

  • bottomLeftOrigin (bool) – Whether to use the bottom left corner as the origin. If False, then the top left corner is used instead. Defaults to False.

mil_vision_tools.points_in_image(camera: PinholeCameraModel, points: ndarray) ndarray[source]

Returns Mx2 np array of image points from projecting given 3D points. Only points within image resolution are included in output points, ignoring any outside the resolution of the camera.

Parameters
  • camera_mode (PinholeCameraModel) – PinholeCameraModel instance of camera.

  • points (np.ndarray) – Nx3 np array of 3 points in camera frame.

Returns

Mx2 of projected image points which are within camera resolution.

Return type

np.ndarray

mil_vision_tools.roi_enclosing_points(camera: PinholeCameraModel, points: ndarray, border: Tuple[int, int] = (0, 0)) Optional[Tuple[slice, slice]][source]

Gets region of interest in image which encloses the projected 3D points. Output is given in slice format, so user can easily slice image.

>>> roi = roi_enclosing_points(camera_model, object_points)
>>> img_object = img[roi]
>>> cv2.imshow('Object', img_object)
Parameters
  • camera_model (PinholeCameraModel) – The model representing the pinhole camera.

  • points (np.ndarray) – Nx3 np array of 3 points in camera frame.

  • border (Tuple[int, int]) – Extra pixels to add around region of interest.

Returns

Region of interest tuple that can be used to slice image in format (slice(ymin, ymax), slice(xmin, xmax)) or None if none of the points can be seen in the image.

Return type

Optional[Tuple[slice, slice]]

mil_vision_tools.rect_from_roi(roi: Tuple[slice, slice])[source]

Return rectangle style tuple from a roi slice tuple, like the one from roi_enclosing_points.

>>> roi = roi_enclosing_points(camera_model, object_points)
>>> rectangle = tuple_from_slice(roi)
>>> cv2.rectangle(img, rectangle[0], rectangle[1], (0, 255, 0), 3)
Parameters

roi (Tuple[slice, slice]) – Region of interest slice tuple in form (slice(ymin, ymax), slice(xmin, xmax)).

Returns

Rectangle tuple.

mil_vision_tools.quaternion_from_rvec(rvec: ndarray) ndarray[source]

Converts a rotation vector (like from cv2.SolvePnP) into a quaternion.

Parameters

rvec (np.ndarray) – The source rotation vector.

Returns

The resulting quaternion matrix.

Return type

np.ndarray

mil_vision_tools.create_object_msg(name: str, confidence: Optional[float] = None, center: Optional[ndarray] = None, contour: Optional[ndarray] = None, rect: Optional[ndarray] = None, attributes: str = '')[source]

Helper function to create a mil_msgs/ObjectInImage message.

Only one of center, contour, or rect should be set, depending on what information is needed/available in your application.

Parameters
  • name (str) – Name of the identified object.

  • attributes (str) – Attributes to attach to message, the purpose and value of this attribute will vary by application. Defaults to an empty string.

  • confidence (Optional[float]) – Float between 0 and 1 describing the confidence that name is correct for this object. Leave as None if confidence is not known (will be set to -1).

  • center (Optional[np.ndarray]) – [x, y] of the center point of the object.

  • contour (Optional[np.ndarray]) – Nx1x2 or Nx2 numpy array of pixels making up the contour around the object.

  • rect (Optional[np.ndarray]) – A 4 wide tuple/array-like representing the bounding box around the object as (X, Y, width, height), which is the representation returned by cv2.boundingRect.

Returns

Message object filled as described above.

Return type

ObjectInImage

bool mil_vision::larger_contour(const Contour &c1, const Contour &c2)
cv::MatND mil_vision::smooth_histogram(const cv::MatND &histogram, size_t filter_kernel_size = 3, float sigma = 1.0)
std::vector<float> mil_vision::generate_gaussian_kernel_1D(size_t kernel_size = 3, float sigma = 1.0)
std::vector<cv::Point> mil_vision::find_local_maxima(const cv::MatND &histogram, float thresh_multiplier)
std::vector<cv::Point> mil_vision::find_local_minima(const cv::MatND &histogram, float thresh_multiplier)
unsigned int mil_vision::select_hist_mode(std::vector<cv::Point> &histogram_modes, unsigned int target)
unsigned int mil_vision::select_hist_mode(std::vector<cv::Point> &histogram_modes, int target)
void mil_vision::range_from_param(std::string &param_root, Range &range)
void mil_vision::inParamRange(cv::Mat &src, Range &range, cv::Mat &dest)
cv::Mat mil_vision::rotateKernel(const cv::Mat &kernel, float theta, bool deg = false, bool no_expand = false)

Returns a version of the input kernel rotated about its center point.

Parameters
  • kernel – Original kernel.

  • theta – Angle in radians by which to rotate the kernel. Positive angles rotate the kernel counterclockwise.

  • deg – Optional, defaults to false. Will assume that theta is given in degrees if set to true.

  • no_expand – Optional, defaults to false. Will leave the output size the same as the input size. Parts of the original kernel may fall outside the output canvas after the rotation.

cv::Mat mil_vision::makeRotInvariant(const cv::Mat &kernel, int rotations = 8)

Creates a new kernel that is rotationally invariant by averaging rotated instances of the original.

Parameters
  • kernel – Original kernel.

  • rotations – Optional, defaults to 8. The number of the times that the original kernel will be rotated before the rotated versions are averaged together. The rotated kernels will be uniformly spread angularly.

float mil_vision::getRadialSymmetryAngle(const cv::Mat &kernel, float ang_res = 0.1, bool deg = false)

Returns the minimum theta for which a version of the kernel that has been rotated by theta radians will be approximately identical to the original.

Parameters
  • kernel – The input kernel.

  • ang_res – Optional, defaults to 0.1. The result will be a multiple of this angle.

  • deg – Optional, defaults to false. The output will be in degrees instead of radias if set to true.

C++ Type Aliases

template<typename T = pcl::PointXYZ>
using mil_vision::PCD = pcl::PointCloud<T>
template<typename T = pcl::PointXYZ>
using mil_vision::PCDPtr = std::shared_ptr<PCD<T>>
template<typename T>
using mil_vision::SPtrVector = std::vector<std::shared_ptr<T>>
template<typename T>
using mil_vision::UPtrVector = std::vector<std::unique_ptr<T>>

Vision Utility Functions

Eigen::Vector3d mil_vision::kanatani_triangulation(const cv::Point2d &pt1, const cv::Point2d &pt2, const Eigen::Matrix3d &essential, const Eigen::Matrix3d &R)

Computes a triangulation using a method developed in research by K. Kanatani, shared in the paper found here.

This method attempts to calculate the triangulation of two points in stereo vision, given two points, an essential matrix, and (presumably) the horizontal difference between the points.

Warning

This method does not appear to be used for any particular action in the codebase. It may have only been written as a test.

Parameters
  • pt1 – The first point from the first camera.

  • pt2 – The second point from the second camera.

  • essential – The essential matrix to use in the calcaulation.

  • R – ???

void mil_vision::statistical_image_segmentation(const cv::Mat &src, cv::Mat &dest, cv::Mat &debug_img, const int hist_size, const float **ranges, const int target, std::string image_name = "Unnamed Image", bool ret_dbg_img = false, const float sigma = 1.5, const float low_thresh_gain = 0.5, const float high_thresh_gain = 0.5)
cv::Mat mil_vision::triangulate_Linear_LS(cv::Mat mat_P_l, cv::Mat mat_P_r, cv::Mat undistorted_l, cv::Mat undistorted_r)
Eigen::Vector3d mil_vision::lindstrom_triangulation(const cv::Point2d &pt1, const cv::Point2d &pt2, const Eigen::Matrix3d &essential, const Eigen::Matrix3d &R)

ContourClassifier

class mil_vision_tools.ContourClassifier(classes: List[str], training_file: Optional[str] = None, labelfile: Optional[str] = None, image_dir: Optional[str] = None)[source]

Abstract class to represent a classifier for contours within images.

See GaussianColorClassifier for an example.

classes

A list of class names (as strings), for example: ['red', 'white', 'blue'].

Type

List[str]

training_file

The name of the training file.

Type

Optional[str]

labelfile

The file containing the labels.

Type

Optional[str]

image_dir

The directory containing the images.

Type

Optional[str]

abstract class property FEATURES: List[str]

For concrete classes, must return a list of strings naming the features returned by get_features.

For example, an example overload might return ['red_mean', 'blue_mean', 'green_mean'].

class_to_string(classes: Union[List[int], int]) Union[List[str], str][source]

Maps a single integer or list of integers representing a class index to the corresponding class string in the set of those passed to the constructor.

Parameters

classes (Union[List[int], int]) – The list of indexes to find the associated classes of. Otherwise, the name of one class to find one index of.

Returns

The list of found classes or the one found class.

Return type

Union[List[str], str]

classify(img: ndarray, mask: ndarray) int[source]

Classify a contour.

Parameters
  • img (np.ndarray) – 2D image representation.

  • mask (np.ndarray) – Binary mask image representing the contour.

Returns

The class index this is most probable given the features in that contour.

abstract classify_features(features: ndarray) int[source]

For concrete classes, classify a contour based on it’s set of features returned from get_features.

Parameters

features (np.ndarray) – a list of numerical features in the order returned by get_features().

Returns

The index of the class which is the most probabable classification based on the features.

extract_labels(labelfile: Optional[str] = None, image_dir: Optional[str] = None) Tuple[ndarray, ndarray][source]

Extract features and labeled classes from a project labeled on labelbox.io.

Parameters
  • labelfile (Optional[str]) – the json file containing the labels for the project

  • image_dir (Optional[str]) – directory where source images for the project can be found

Raises

Exception – No labels were found.

Returns

Tuple of the features found and classes found.

Return type

Tuple[np.ndarray, np.ndarray]

feature_probabilities(features: ndarray) List[float][source]

Allows child classes to give probabilities for each possible class given a features vector, instead of just one classification. By default, gives 1.0 to classified class and 0.0 to others.

Returns

The list of probabilities. Each member ranges from 0 to 1.

Return type

List[float]

abstract get_features(img: ndarray, mask: ndarray) ndarray[source]

Returns an array of numerical features based on on the image and contour mask. This should be implemented in all child classes.

Parameters
  • img (np.ndarray) – The 2D image to get the features of.

  • mask (np.ndarray) – The contour mask to apply over the image.

Returns

An array of numerical features found.

Return type

np.ndarray

main(params, description='Interface with a contour classifier class')[source]

Allows execution of the class from a command-line interface.

probabilities(img: ndarray, mask: ndarray) List[float][source]

Return a vector of probabilities of the features retrieved from the contour with given mask corresponding to the class list.

Parameters
  • img (np.ndarray) – 2D image representation.

  • mask (np.ndarray) – Binary mask image representing the contour.

Returns

The list of probabilities across the entire image.

Return type

List[float]

read_from_csv(training_file: Optional[str] = None)[source]

Return features and classes from specified training file.

Parameters

training_file (Optional[str]) – The name of the training file.

save_csv(features: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], classes: Union[_SupportsArray[dtype], _NestedSequence[_SupportsArray[dtype]], bool, int, float, complex, str, bytes, _NestedSequence[Union[bool, int, float, complex, str, bytes]]], training_file: Optional[str] = None) None[source]

Save the features and labeled classes to a csv file to be used in later training.

Parameters
  • features (npt.ArrayLike) – The features found.

  • classes (npt.ArrayLike) – The classes used in the classifier.

  • training_file (Optional[str]) – The name of the file to save the data as.

score(features: ndarray, classes: ndarray) float[source]

Returns the classification accuracy based on the set of features and labeled class indices.

Parameters
  • features (np.ndarray) – Array of shape (n_samples, m_features), where each row is a list of features.

  • classes (np.ndarray) – Array of shape (n_samples) with labeled class indices.

Returns

A proportion accuracy correct_classificiations / len(classes).

string_to_class(strings: Union[List[str], str]) Union[List[int], int][source]

Maps a single string or list of strings representing a class in the set of those passed to the constructor to the corresponding integer index representing this class, used in most functions for training / classification.

Parameters

strings (Union[List[str], str]) – The string or list of strings to return the index of.

Returns

The list of indexes (if multiple strings are passed), or the single index if only one string is passed.

Return type

Union[List[int], int]

abstract train(features: ndarray, classes: ndarray)[source]

For concrete classes, train the classifier based on the set of features and their labeled class index.

Parameters
  • features (np.ndarray) – Array with shape (n_samples, m_features), where each row is a list of features in the order returned from get_features().

  • classes (np.ndarray) – List with shape (n_samples) of labeled class index corresponding to each row of features.

train_from_csv(training_file: Optional[str] = None) None[source]

Train the classifier given the labeled feature found in filename which should be in the format obtained from save_csv().

Parameters

training_file (Optional[str]) – The name of the training file.

Threshold

class mil_vision_tools.Threshold(low: Union[List[int], Tuple[int], ndarray], high: Union[List[int], Tuple[int], ndarray], conversion_code: Optional[str] = None, in_space: str = 'BGR', thresh_space: str = 'BGR')[source]

Helper class to represent the upper and lower bounds of a color threshold in OpenCV-related processes. This class could be used to only locate parts of an image between two color ranges, a lower range and an upper range.

For example:

Threshold((0, 0, 0), (255, 255, 255)) # Thresholds BGR image in BGR colorpace
Threshold([25, 180 180], np.array([50, 190, 200]), thresh_space='LAB')  # Thresholds BGR image in LAB
Threshold((50, 50, 50), [200, 200, 200], in_space='HSV', thresh_space='LAB') # Threshold HSV image in LAB
low

The lower range of the threshold.

Type

Union[List[int], Tuple[int], np.ndarray]

high

The higher range of the threshold.

Type

Union[List[int], Tuple[int], np.ndarray]

conversion_code

The conversion code describing how to convert the image between two different colorspaces. This could be a string, such as COLOR_RGB2GRAY (if a string is passed in initialization) or an integer (if None is passed in its place and in_space and thresh_space are relied upon to determine the color conversion). None if no conversion is supplied and the in and threshold space are the same.

Type

Optional[Union[str, int]]

in_space

The OpenCV colorspace to use for the image source.

Type

str

thresh_space

The OpenCV colorspace to use for the threshold.

Type

str

Raises

AttributeError – No conversion code could be determined.

create_trackbars(window: Optional[str] = None) None[source]

Create OpenCV GUI trackbars to adjust the threshold values live.

Parameters

window (Optional[str]) – Name of OpenCV window to put trackbars in.

classmethod from_dict(d, in_space: str = 'BGR', thresh_space: Optional[str] = None)[source]

Loads thresholds from a dictionary. See examples for valid dictionaries.

For example, to load a threshold from a dictionary:

>>> a = {'HSV': {'low': [0, 20, 50], 'high': [255, 255, 255]} }
>>> Threshold.from_dict(a)
Threshold([0, 20, 50], [255, 255, 255], thresh_space='HSV')
>>> b = {'LAB': {'low': [0, 66, 66], 'high': [255, 180, 180]} }
Threshold([0, 66, 66], [255, 180, 180], thresh_space='LAB')
Raises
Parameters
  • d (Dict[str, Dict[str, List[int]]]) – The dictionary containing the relevant

  • information. (colorspace) –

classmethod from_param(param: str, in_space: str = 'BGR', thresh_space: Optional[str] = None)[source]

Loads thresholds from a ROS parameter name. The value of the parameter is excepted to be a proper dictionary, on which from_dict() is called.

Parameters
  • param (str) – The name of the parameter.

  • in_space (str) – The colorspace of the source image. Defaults to BGR.

  • thresh_space (Optional[str]) – The colorspace of the threshold space. Defaults to None.

ImageMux

class mil_vision_tools.ImageMux(size: Tuple[int, int] = (480, 640), shape: Tuple[int, int] = (2, 2), labels: Optional[List[str]] = None, keep_ratio: bool = True, border_color: Tuple[int, int, int] = (255, 255, 255), border_thickness: int = 1, text_color: Tuple[int, int, int] = (255, 255, 255), text_font: int = 5, text_scale: int = 1, text_thickness: int = 2)[source]

Utility to create a customizable square grid of images, with labels and borders, which will generate a single image from the grid at any time. Useful for combining several debug images into one.

See the mil_vision_tools/image_mnux.py file for a usage example.

x[key] = img

Sets the key index in the panes of the grid to have img as the source image for that pane. Equivalent to using set_image().

x()

Returns the grid image, with decorations. Equivalent to calling get_image().

size

Tuple representing the size of the grid image, in pixels.

Type

np.ndarray

shape

Tuple containing number of (rows, cols) of smaller images in the grid.

Type

np.ndarray

labels

List of strings or None of length shape[0] * shape[1]. Each label corresponds to one image in the grid.

Type

List[str]

pane_size

A numpy array of (height, width) representing the size of each pane, in pixels.

Type

np.ndarray

keep_ratio

If True, do not stretch image to insert into grid pane

Type

bool

border_color

The color of the border to use, in BGR format. Defaults to [255, 255, 255], or white.

Type

Tuple[int, int, int]

border_thickness

The thickness of the border between the images. Defaults to 1.

Type

int

text_color

The color of the text to use for the image labels. Defaults to [255, 255, 255], or white.

Type

Tuple[int, int, int]

text_font

An OpenCV font to use for the image labels.

Type

int

text_scale

Scaling factor for label text. Defaults to 1.

Type

int

text_thickness

Thickness of the label text. Defaults to 2.

Type

int

Parameters
  • size (Tuple[int, int]) – Tuple representing the size of the grid image, in pixels.

  • shape (Tuple[int, int]) – Tuple containing number of (rows, cols) of smaller images in the grid.

  • labels (Optional[List[str]]) – Optional list of strings of length shape[0] * shape[1]. Each label corresponds to one image in the grid.

  • keep_ratio (bool) – If True, do not stretch image to insert into grid pane

  • border_color (Tuple[int, int, int]) – The color of the border to use, in BGR format. Defaults to [255, 255, 255], or white.

  • border_thickness (int) – The thickness of the border between the images. Defaults to 1.

  • text_color (Tuple[int, int, int]) – The color of the text to use for the image labels. Defaults to [255, 255, 255], or white.

  • text_font (int) – An OpenCV font to use for the image labels.

  • text_scale (int) – Scaling factor for label text. Defaults to 1.

  • text_thickness (int) – Thickness of the label text. Defaults to 2.

get_image() ndarray[source]

Returns the image grid, with labels and borders.

Returns

The grid image, with decorations.

Return type

np.ndarray

property image: ndarray

Returns the decorated grid image. Equivalent to calling get_image().

Returns

The grid image, with decorations.

Return type

np.ndarray

set_image(key: Union[int, Tuple[int, int]], img: ndarray)[source]

Sets the content of one pane in the image grid.

Parameters
  • key (Union[int, Tuple[int, int]]) – The index of the pane to set to the update. If an integer, the pane at that index is updated (counting left to right, then top to bottom). If a tuple, then the pane at (row, col) is updated.

  • img (np.ndarray) – Array with shape (m, n, 3) or (m, n, 1) representing the image to insert in the pane specified in key. If a one-channel image, first convert grayscale to BGR. If keep_ratio was True in constructor, will add black bars as necessary to fill pane. Otherwise, use standard cv2.resize to fit img into pane.

Raises

AssertionError – If key is wrong type of out of bounds.

ImageSet

class mil_vision_tools.ImageSet[source]

Represents a set of images passed to ImageProc. After then algorithm operates on the raw image, the other parameters are completed as well.

color_encoding

The color encoding used in the images, if relevant. May be None if never set.

Type

Optional[str]

raw

The raw image used.

Type

Optional[np.ndarray]

mono

The mono image output (in mono8 colorspace).

Type

Optional[np.ndarray]

rect

The rectified image used (using the mono image).

Type

Optional[np.ndarray]

color

The color image output.

Type

Optional[np.ndarray]

rect_color

The rectified image output (using the color image).

Type

Optional[np.ndarray]

ImageProc

class mil_vision_tools.ImageProc[source]

Translation of image_proc processor, originally in C++. Used to debayer and rectify images. Use a bitmask assembled from the below flags to change the behavior of process.

RAW

Flag representing the raw representation of an image. Equal to 0.

Type

int

MONO

Flag representing that the output should include a mono image.

Type

int

RECT

Flag representing that the output should include a rectified image.

Type

int

COLOR

Flag representing that the output should include a color image.

Type

int

RECT_COLOR

Flag representing that the output should include a rectified color image.

Type

int

ALL

Flag representing that all outputs should be completed.

Type

int

bridge

The ROS bridge to OpenCV.

Type

CvBridge

static process(raw_msg: Image, model: Optional[PinholeCameraModel], out: ImageSet, flags: int) None[source]

Given an original image message (sensor_msgs/Image), fill in the out ImageSet with the processed (debayered, rectified, grayscaled) images (in numpy format) based on the flags.

Parameters
  • raw_msg (Image) – Original image message to process.

  • model (Optional[PinholeCameraModel]) – Camera model used to rectify images if RECT or RECT_COLOR flag is set. If images do not need to be rectified, then using None for this attribute is acceptable.

  • out (ImageSet) – The requested images will be set in out based on the flags bitmask.

  • flags (int) – A bitmask of the flags inticating which processed images you want in out. For example, using the ImageProc.RECT_COLOR | ImageColor.RECT mask will fill in the raw, rect_color, and rect images in the output.

Raises

Exception – The model parameter is not an instance of PinholeCameraModel, but a rectified image is desired.

TrackedObject

class mil_vision_tools.TrackedObject(id: int, stamp: rospy.Time, features: list[Any], data: Any | None = None)[source]

Contains a single object tracked by an ObjectsTracker instance.

id

The ID of the tracked object.

Type

int

created_at

A timestamp of when the object was observed.

Type

rospy.Time

stamp

A timestamp of when the object was most recently observed.

Type

rospy.Time

features

A list of features used by ObjectsTracker.

Type

List[Any]

observations

A number of observations executed on the object.

Type

int

Parameters
  • id (int) – A unique id for this object.

  • stamp (rospy.Time) – Time stamp object was observed.

  • features (List[Any]) – Feature set used by the specific implementation of ObjectsTracker.

  • data (Optional[Any]) – Optional additional application-specific data to track

property age

Difference in time between when the object was most recently observed compared to when it was first observed.

Returns

The difference in the times.

Return type

genpy.Duration

update(stamp: rospy.Time, features: list[Any], data: np.ndarray | None = None) None[source]

Update an object’s metadata.

Parameters
  • stamp (rospy.Time) – Time stamp object was just observed in.

  • features (List[Any]) – Feature set used by the specific implementation of ObjectsTracker.

ObjectsTracker

class mil_vision_tools.ObjectsTracker(expiration_seconds: int = 5, max_distance: float = 0)[source]

Tracks an arbitrary number of objects over time based on a disance metric. Useful for finicky computer vision programs, ensuring an object was observed in multiple recent frames before using for autonomy.

This is an abstract class, meaning that you must use a child class that implements the required methods.

expiration_seconds

A duration object representing the maximum time that old objects can be kept before being removed.

Type

rospy.Duration

max_distance

THe maximum distance two objects can be away to be matched.

Type

float

objects

A list of objects being tracked.

Type

List[TrackedObject]

max_id

The maximum ID being used amongst the objects. This ID is used when constructing new objects, after which, it is incremented by one.

Type

int

Parameters
  • expiration_seconds (int) – Longest time, in seconds, for old objects to be kept.

  • max_distance (float) – Maximum distance two objects can be away to be matched.

add_observation(stamp: rospy.Time, features: list[Any], data: Any | None = None) TrackedObject[source]

Add a new observation to the tracked objects.

Parameters
  • stamp (rospy.Time) – Time object representing the time this observation arrived.

  • features (List[Any]) – A list of features to find the distance of, relative to the features stored by each object.

  • data (Optional[Any]) – Optional data to store with each object. Defaults to None.

Returns

The object that was first updated, or the new object that was just constructed.

Return type

TrackedObject

clear_expired(now: rospy.Time | None = None)[source]

Deletes expired objects. Should be called frequently to clear expired objects.

Parameters

now (rospy.Time) – The time to compare to each object’s timestamp to see if it has expired and needs to be removed from objects.

get_persistent_objects(min_observations: int = 10, min_age: rospy.Duration = rospy.Duration[0]) list[TrackedObject][source]

Get a list of objects which have persisted sufficiently long.

Parameters
  • min_observations (int) – Minimum number of times the object was seen to be returned. Defaults to 10.

  • min_age (rospy.Duration) – Minimum age an object must be to be returned in result. Defaults to a duration of zero.

Returns

List of tracked objects meeting the above criteria.

Return type

List[TrackedObject]

RectFinder

class mil_vision_tools.RectFinder(length: float = 1.0, width: float = 1.0)[source]

Keeps a model of a rectangle in meters, providing utility functions to find 2D and 3D pose estimations of rectangles matching this model found in a computer vision program.

length

The measurement of the longer side of the rectangle, in meters.

Type

float

width

The measurement of the shorter side of the rectangle, in meters.

Type

float

model_3D

The internal model of the three-dimensional rectangle.

Type

np.ndarray

Initializes the internal model of the rectangle. If width > length, the two will be reversed so that length is always the longer side.

Parameters
  • length (float) – The measurement of the longer side of the rectangle in meters.

  • width (float) – The measurement of the shorter side of the rectangle in meters.

draw_model(size: tuple[int, int] = (500, 500), border: int = 25) np.ndarray[source]

Returns a 1 channel image displaying the internal model of the rectangle. Useful for visually checking that your model is reasonable. Also useful to see what orientation of a contour will be considered 0 rotation in get_pose_3D().

Parameters
  • size (Tuple[int, int]) – A 2 element tuple of (x, y), which will be the resolution of the returned image.

  • border (int) – Size of the border, in pixels, around the model contour in the returned image.

Returns

A one channel image with shape specified by size param with internal model drawn in white surrounded by a black border

Return type

np.ndarray

classmethod from_polygon(polygon: Polygon) RectFinder[source]

Creates a RectFinder from a geometry_msgs/Polygon message.

The length of the class becomes the range of x/y (whichever larger). The width becomes the range of x/y (whichever shorter).

Parameters

polygon (Polygon) – The polygon to construct the class from.

Raises

AssertionError – The polygon parameter is not an instance of Polygon.

Returns

A new instance of the class.

Return type

RectFinder

get_corners(contour: np.ndarray, debug_image: bool | None = None, epsilon_range: tuple[float, float] = (0.01, 0.1), epsilon_step: float = 0.01) np.ndarray | None[source]

Attempts to find the 4 corners of a contour representing a quadrilateral.

If the corners are found, it returns these 4 points sorted as described in sort_corners(). These corners can be used for get_pose_2D() and get_pose_3D().

If a 4 sided polygon cannot be approximated, returns None.

Parameters
  • contour (np.ndarray) – The contour of the image.

  • debug_image (Optional[bool]) – If not None, will draw corners with text for indexes onto image.

  • epsilon_range (Tuple[float, float]) – Tuple of two epsilon values (factor of contour arclength) to try for polygon approx.

  • epsilon_step (float) – How much to increment epsilon each time while iterating through epsilon_range.

Returns

The array, if it can be approximated.

Return type

Optional[np.ndarray]

get_pose_2D(corners: np.ndarray) tuple[Any, Any][source]

Finds the 2D center of the rectangle and a unit direction vector along the length of the rectangle in pixels….

Parameters
  • corners (np.ndarray) – 4x2 numpy array from get_corners() representing

  • image. (the 4 sorted corners in the) –

Returns

A tuple representing (center, vector).

Return type

Tuple[Any, Any]

get_pose_3D(corners: np.ndarray, intrinsics: np.ndarray | None = None, dist_coeffs: np.ndarray | None = None, cam: PinholeCameraModel | None = None, rectified: bool = False) tuple[Any, Any][source]

Uses the model of the object, the corresponding pixels in the image, and camera intrinsics to estimate a 3D pose of the object.

Either cam and rectified must be provided, or intrinsics and dist_coeffs.

Parameters
  • corners (np.ndarray) – 4x2 numpy array from get_corners() representing the 4 sorted corners in the image.

  • cam (Optional[PinholeCameraModel]) – The camera model.

  • rectified (bool) – If cam is set, set True if corners were found in an already rectified image (image_rect_color topic).

  • intrinsics (np.ndarray) – Camera intrinisic matrix.

  • dist_coeffs (np.ndarray) – Camera distortion coefficients.

Returns

Represents the translation and rotation vector between the camera and the object in meters/radians. Use cv2.Rodrigues to convert rvec to a 3x3 rotation matrix.

Return type

Tuple[Any, Any]

static sort_corners(rect: np.ndarray, debug_image: bool | None = None) np.ndarray[source]

Given a contour of 4 points, returns the same 4 points sorted in a known way. Used so that indices of contour line up to that in model for cv2.solvePnp

Parameters
  • rect (np.ndarray) – An array representing the contour of 4 points.

  • debug_image (Optional[bool]) – If not None, puts a circle and text of the index in each corner.

p[0] = Top left corner         0--1     1--------2
p[1] = Top right corner        |  |  or |        |
p[2] = Bottom right corner     |  |     0--------3
p[3] = Bottom left corner      3--2
to_polygon() Polygon[source]

Returns a geometry_msgs/Polygon instance representing the four corners of the rectangle where all z = 0.

Returns

A constructed message.

Return type

Polygon

verify_contour(contour: ndarray) Any[source]

Returns a numerical comparison between a contour and the perfect model of the rectangle. A perfectly matched contour returns 0.0.

Useful for filtering contours.

VisionNode

Attributes
Methods
class mil_vision_tools.VisionNode[source]

ABC class to be used unify the interfacing for MIL’s computer vision scripts. Handles the bootstrap of image subscription, enable/disable, etc. Provides a callback for new images which is expected to return.

camera_model

Camera model used throughout the class. Initially set to None, but later set to an instance of the pinhole camera model when enabled.

Type

Optional[PinholeCameraModel]

abstract find_objects(img: ndarray) List[ObjectInImage][source]

Given an image as a source, this abstract method should be overridden to return a list of ObjectInImage.

Parameters

img (np.ndarray) – The source image.

Returns

A list of the objects found in the image.

Return type

List[ObjectInImage]

CameraLidarTransformer

Functions

Note

Most of the methods/attributes for this class are private, which is why none appear. To have these documented, either make these public or discuss the option of documenting private C++ class members with a software lead.

class CameraLidarTransformer

Public Functions

CameraLidarTransformer()

Default constructor for the transformer between camera and LIDAR data.

ClosedCurve

class ClosedCurve

Public Functions

ClosedCurve(std::vector<cv::Point2i> points)

Initializes a new closed curve with a given set of points. No operations are completed upon initialization; only internal attributes are stored.

Parameters

points – A list of points which comprise the curve.

void applyPerturbation(const std::vector<uint8_t> &perturbation, int idx)
ClosedCurve perturb(const std::vector<uint8_t> &perturbation, int idx)

Public Static Functions

static bool validateCurve(std::vector<cv::Point2i> &curve)
struct Perturbation

Public Members

size_t idx
uint8_t entry
uint8_t exit
std::vector<uint8_t> route

ActiveContour

Functions
class ActiveContour

Public Functions

ActiveContour()

Default constructor for the ActiveContour class. Does not appear to have definition.

CameraObserver

class CameraObserver

Public Functions

CameraObserver(ros::NodeHandle &nh, std::string &pcd_in_topic, std::string &cam_topic, size_t hist_size)
inline std::shared_ptr<image_geometry::PinholeCameraModel> getCameraModelPtr() const
inline std::vector<ColorObservation> operator()(const PCD<pcl::PointXYZ>::ConstPtr &pcd)
inline bool ok() const
ColorObservation::VecImg get_color_observations(const PCD<pcl::PointXYZ>::ConstPtr &pcd)

ColorObservation

Attributes
struct ColorObservation

Public Types

using Vec = std::vector<ColorObservation>
using VecImg = cv::Mat_<Vec>

Public Members

float xyz[3]
uint8_t bgr[3]

UnoccludedPointsImg

Functions
class UnoccludedPointsImg

Public Functions

UnoccludedPointsImg()

PointColorStats

Attributes
struct PointColorStats

Public Members

float xyz[3]
uint8_t bgr[3]
float var[3]
int n

PcdColorizer

class PcdColorizer

Public Functions

PcdColorizer(ros::NodeHandle nh, std::string input_pcd_topic)

Public constructor to initialize a new colorizer.

Parameters
  • nh – The node handle to initialize the colorizer with.

  • input_pcd_topic – The name of the topic serving point clouds.

inline bool ok() const

Returns true if ROS and the class are functioning properly.

PcdColorizer(ros::NodeHandle nh, std::string input_pcd_topic, std::string output_pcd_topic, std::string rgb_cam_topic, std::string rgb_cam_frame)
inline ~PcdColorizer()
void _transform_to_cam()
void _color_pcd()

SingleCloudProcessor

Functions
class SingleCloudProcessor

Public Functions

SingleCloudProcessor(ros::NodeHandle nh, std::string &in_pcd_topic, size_t hist_size)

Constructs a new cloud processor.

Parameters
  • nh – The node handle for the class.

  • in_pcd_topic – The name of the topic supplying point clouds.

  • hist_size – The size of the image history buffer.

void operator()(const PCD<pcl::PointXYZ>::ConstPtr &pcd)

Incomplete method. Aside from printing debug statements, does no actions.

inline bool ok() const

Returns true if ROS and the class are functioning properly.

PixelType

enum class mil_vision::PixelType

Values:

enumerator _8UC1
enumerator _8SC1
enumerator _16UC1
enumerator _16SC1
enumerator _32SC1
enumerator _32FC1
enumerator _64FC1
enumerator _8UC2
enumerator _8SC2
enumerator _16UC2
enumerator _16SC2
enumerator _32SC2
enumerator _32FC2
enumerator _64FC2
enumerator _8UC3
enumerator _8SC3
enumerator _16UC3
enumerator _16SC3
enumerator _32SC3
enumerator _32FC3
enumerator _64FC3
enumerator _8UC4
enumerator _8SC4
enumerator _16UC4
enumerator _16SC4
enumerator _32SC4
enumerator _32FC4
enumerator _64FC4
enumerator _UNKNOWN

CameraFrame

template<typename cam_model_ptr_t = std::shared_ptr<image_geometry::PinholeCameraModel>, typename img_scalar_t = uint8_t, typename time_t_ = ros::Time, typename float_t = float>
class CameraFrame

Public Functions

inline CameraFrame()

Default constructor for the class. Does not complete any actions; serves as an empty constructor.

inline CameraFrame(const CameraFrame &other)

Copy constructor for the class.

CameraFrame(const sensor_msgs::ImageConstPtr &image_msg_ptr, cam_model_ptr_t &cam_model_ptr, bool is_rectified = false, float_t store_at_scale = 1.0)

Constructs a class instance from a ROS message.

Parameters
  • image_msg_ptr – A pointer to the image message.

  • cam_model_ptr – A pointer to the camera model.

  • is_rectified – Whether the image is rectified (camera geometry has been taken into account).

  • store_at_scale – The scale to store the image at.

inline cam_model_ptr_t getCameraModelPtr() const

Returns a pointer to the camera model.

Returns

The pointer.

inline unsigned int seq() const

Representing the sequence number of camera frames which have come from the same source.

These progressively increase over time as more camrea frame objects are generated.

Returns

The sequence number.

inline time_t_ stamp() const

Returns the timestamp associated with the image.

Returns

The timestamp.

inline const cv::Mat_<img_scalar_t> &image() const

Returns an image mat representing the image.

Returns

The image as an OpenCV mat.

inline bool rectified() const

Whether the image has been rectified using the disortion parameters specified by the camera model.

Returns

Whether the rectification has occurred.

inline float_t getImageScale() const

The scale of the image. A scale of 1 represents an image that is its true size and is not scaled.

Returns

The scale.

inline void copyImgTo(cv::Mat dest) const

Copies the image to another OpenCV mat.

inline bool isCameraGeometryKnown() const

Returns true if the camera geometry is NOT specified.

Returns

Whether the camera geometry has been specified.

ImageWithCameraInfo

struct ImageWithCameraInfo

Packages corresponding sensor_msgs::ImageConstPtr and sensor_msgs::CameraInfoConstPtr into one object. Containers of these objects can be sorted by their image_time attribute.

Public Functions

inline ImageWithCameraInfo()
ImageWithCameraInfo(sensor_msgs::ImageConstPtr _image_msg_ptr, sensor_msgs::CameraInfoConstPtr _info_msg_ptr)
inline bool operator<(const ImageWithCameraInfo &right) const

Public Members

sensor_msgs::ImageConstPtr image_msg_ptr
sensor_msgs::CameraInfoConstPtr info_msg_ptr
ros::Time image_time

FrameHistory

class FrameHistory

Object that subscribes itself to an image topic and stores up to a user defined number of ImageWithCameraInfo objects. The frame history can then be retrieved in whole or just a portion.

Public Functions

FrameHistory(std::string img_topic, unsigned int hist_size)
~FrameHistory()
void image_callback(const sensor_msgs::ImageConstPtr &image_msg, const sensor_msgs::CameraInfoConstPtr &info_msg)
std::vector<ImageWithCameraInfo> get_frame_history(unsigned int frames_requested)
int frames_available()

Public Members

const std::string topic_name
const size_t history_size

Range

Attributes
struct Range

Helper struct to show a range between two numbers.

Public Members

cv::Scalar lower
cv::Scalar upper

CameraFrameSequence

template<typename cam_model_ptr_t, typename img_scalar_t, typename time_t_, typename float_t = float>
class CameraFrameSequence

Public Types

using CamFrame = CameraFrame<cam_model_ptr_t, img_scalar_t, time_t_, float_t>
using CamFramePtr = std::shared_ptr<CamFrame>
using CamFrameConstPtr = std::shared_ptr<CamFrame const>
using CamFrameSequence = CameraFrameSequence<cam_model_ptr_t, img_scalar_t, time_t_, float_t>
using CircularBuffer = boost::circular_buffer<CamFramePtr>

Public Functions

CameraFrameSequence() = default

Default constructor for the class. Equivalent to the true default constructor.

CameraFrameSequence(const CameraFrameSequence&) = delete

Forbidden copy constructor. Set to the delete operator.

CameraFrameSequence(CameraFrameSequence&&) = delete

Forbidden move constructor. Set to the delete operator.

virtual ~CameraFrameSequence() = default
virtual size_t size() const = 0
virtual cam_model_ptr_t getCameraModelPtr() const = 0
virtual bool isGeometryConst() const = 0
virtual int rows() const = 0
virtual int cols() const = 0
virtual CamFrameConstPtr getFrameFromTime(time_t_) = 0

Returns a copy of the CameraFrame taken closest to the given time.

Parameters

time_t_ – The time to query.

virtual CamFrameConstPtr operator[](int) = 0

Returns reference to the nth frame from the most recent. For example frame_sequence[0] is the most recent frame, frame_sequence[-1] is the oldest frame, and frame_sequence[-2] is the second oldest frame.

CameraModel

Functions
template<typename T = float>
class CameraModel

Public Functions

inline CameraModel()
inline ~CameraModel()

ROSCameraStream

template<typename img_scalar_t = uint8_t, typename float_t = float>
class ROSCameraStream : public mil_vision::CameraFrameSequence<std::shared_ptr<image_geometry::PinholeCameraModel>, uint8_t, ros::Time, float>

Public Types

using CamFrame = CameraFrame<cam_model_ptr_t, img_scalar_t, time_t_, float_t>
using CamFramePtr = std::shared_ptr<CamFrame>
using CamFrameConstPtr = std::shared_ptr<CamFrame const>
using CamFrameSequence = CameraFrameSequence<cam_model_ptr_t, img_scalar_t, time_t_, float_t>
using CircularBuffer = boost::circular_buffer<CamFramePtr>

Public Functions

inline ROSCameraStream(ros::NodeHandle nh, size_t buffer_size)
~ROSCameraStream()
bool init(std::string &camera_topic)
inline bool ok()
inline CircularBuffer::iterator begin() const
inline CircularBuffer::iterator end() const
inline size_t size() const
inline cam_model_ptr_t getCameraModelPtr() const
inline bool isGeometryConst() const
inline int rows() const
inline int cols() const
virtual CamFrameConstPtr getFrameFromTime(ros::Time desired_time)

Returns a copy of the CameraFrame taken closest to the given time.

Parameters

time_t_ – The time to query.

virtual CamFrameConstPtr operator[](int i)

Returns reference to the nth frame from the most recent. For example frame_sequence[0] is the most recent frame, frame_sequence[-1] is the oldest frame, and frame_sequence[-2] is the second oldest frame.

PcdSubPubAlgorithm

template<typename output_T = pcl::PointXYZ, typename input_T = pcl::PointXYZ>
class PcdSubPubAlgorithm

Public Types

template<typename T = pcl::PointXYZ>
using PCD = pcl::PointCloud<T>

Public Functions

inline PcdSubPubAlgorithm(ros::NodeHandle nh, std::string input_pcd_topic, std::string output_pcd_topic)
inline bool activated()
inline bool ok()
inline void switchActivation()
inline PcdSubPubAlgorithm()
inline ~PcdSubPubAlgorithm()

ImagePublisher

Functions
class ImagePublisher

Abstraction for publishing images to without the necessary boiler plate. All that’s needed to publish an image is a cv::Mat.

Public Functions

ImagePublisher(const ros::NodeHandle &nh, const std::string &topic, const std::string &encoding = "bgr8", int queue_size = 1)

Creates an ImagePublisher in the namespace of nh.

Parameters
  • nh – NodeHandle to have node in the correct namespace.

  • topic – Camera topic to publish images to.

  • encoding – Type of image encoding

  • queue_size – Size of the queue for the publisher.

void publish(cv::Mat &image_arg)

Publishes an OpenCV image.

Parameters

image_arg – OpenCV image to publish.

ImageSubscriber

class ImageSubscriber

Abstraction for subscribing to image outputs and their info. This returns a copy or reference of the image depending on what you specify in the constructor. The camera info contains valuable information such as the camera matrix and will exit gracefully if not found.

Public Functions

template<typename T>
inline ImageSubscriber(const ros::NodeHandle &nh, const std::string &topic, void (T::* func)(cv::Mat&), T *object, bool use_copy = true, const std::string &encoding = "bgr8", int queue_size = 1)

Create an ImageSubscriber using a class member function as the function callback.

Parameters
  • nh – NodeHandle for the node that is subscribing to a camera topic.

  • topic – Contains the camera topic to subscribe to.

  • func – Function pointer to class member callback function.

  • object – Pointer to object that function belongs to.

  • use_copy – Whether to use a copy or reference of the converted image. If unsure use copy.

  • encoding – Type of encoding for camera topic.

  • queue_size – Queue size to use for subscriber.

inline ImageSubscriber(const ros::NodeHandle &nh, const std::string &topic, void (*func)(cv::Mat&), bool use_copy = true, const std::string &encoding = "bgr8", int queue_size = 1)

Create an ImageSubscriber using a normal function as the function callback.

Parameters
  • nh – NodeHandle for the node that is subscribing to a camera topic.

  • topic – Contains the camera topic to subscribe to.

  • func – Function pointer to callback function.

  • use_copy – Whether to use a copy or reference of the converted image. If unsure use copy.

  • encoding – Type of encoding for camera topic.

  • queue_size – Queue size to use for subscriber.

inline boost::optional<sensor_msgs::CameraInfo> waitForCameraInfo(unsigned int timeout = 10)

Safely returns the camera info if it is found. Waits for timeout seconds to receive the camera info.

Parameters

timeout – Time in seconds to wait for camera info to be received.

Returns

optional value to return data if available otherwise nothing if not.

inline bool waitForCameraModel(image_geometry::PinholeCameraModel &pinhole_camera, unsigned int timeout = 10)

Waits for camera info to be received and fills in the given PinholeCameraModel with the data from it.

Parameters
  • pinhole_camera – PineholeCameraModel to store the resulting camera model in.

  • timeout – Time in seconds to wait for camera info to be received.

Returns

true if data is received, false otherwise.

inline void infoCallback(const sensor_msgs::CameraInfo &info)

Function callback for camera info.

Parameters

info – Camera info message from ROS.

inline void imageCallbackCopy(const sensor_msgs::ImageConstPtr &image)

Function callback for the image copy.

Parameters

image – Image message from ROS.

inline void imageCallbackReference(const sensor_msgs::ImageConstPtr &image)

Function callback for the image reference.

Parameters

image – Image message from ROS.

inline boost::optional<std_msgs::Time> getLastImageTime()

Returns the timestamp for the last image.

Returns

optional value containing the time the last image was received.