Skip to content

Instantly share code, notes, and snippets.

@pra-dan
Created October 27, 2021 10:26
Show Gist options
  • Save pra-dan/8130bccc031b3c18ec588a1d0d1a60b7 to your computer and use it in GitHub Desktop.
Save pra-dan/8130bccc031b3c18ec588a1d0d1a60b7 to your computer and use it in GitHub Desktop.

Observations (for bigger bot)

  1. Related to Forward and Reverse Drive
# Drive codes
0x602: RIGHT DRIVE
0X601: LEFT DRIVE

# Acceleration Codes
Min V: 0xCC = 204
Max V: 0x6A = 106 # faster
  1. In forward Drive, smaller value = larger acceleration
  2. In reverse Drive, larger value = larger acceleration
  3. In current chassis version, the front/back of the bot is decided using its arm. The arm is always assumed to be the left arm.
  4. When unaligned (and post reset condition), the arm measures the distance of the ground at >1200 and > 900 when aligned. The second case is needed when the cleaning has to be stopped and the panel end needs to detected. Both these thresholds are never used simultaneously.
  5. Avoid print statements on devices that have been accessed via nested SSH (BB for example). It leads to network choking and the processes/threads within the device are affected.

Algorithm conventions adopted

  1. The error is calculated such that the error's polarity denotes:
+ve error: Robot is too left
-ve error: Robot is too right

This is irrespective of the direction of the motion.

  1. The python module/scripts have been named as to include forward.py or backward.py in their names. The bot only moves in the direction of its vision; anti-motion observations (i.e., look forward move backward and vice versa) have been discontinued. They are synonymous to
forward.py -> left.py
backward.py -> right.py
  1. During message encoding/decoding in the MQ, the following methods are used and suggested:
1a) Method 1: Single value
#encode to send
FLAG = 99
socket.send(bytes([FLAG]))

#decode to read
messageData = socket.recv()
message = int.from_bytes(messageData,"little")

1b) Method 1: Multiple values
#encode to send
socket.send(bytes([99,99]))
#decode to read
messageByte = socket.recv()
m = memoryview(messageByte)
PNL_OR_GND, UN_OR_ALIGNED = messageByte[0], messageByte[1]



2) Specific to Publisher Subscriber architecture
2b) Multiple values (messagedata1 and messagedata2)
topicfilter = b'10006' #aka topic
socket.send_string("%s %d" % (topicfilter, lidar))

# decode
topicfilter = b'10006'
socket.setsockopt(zmq.SUBSCRIBE, topicfilter)
string = socket.recv()
topic, _messagedata1, _messagedata2 = string.split()
messagedata1, messagedata2 = int(_messagedata1), int(_messagedata2)

Note on Model Optimisation

Various attempts have been made to optimise the model and get faster inference speeds. Nonetheless, we still fail.

  1. h5 -> Uff & Uff inference using TRT8: The first half of the problem is done and I am facing a weird error in the second part for which, I have submitted issue on respective platforms.

  2. h5 -> pb -> onnx -> trt engine (.plan): The docs demonstrate this method for ResNet50 based models while ResNet101 based models are not supported in the versions of Keras used in the demo. I retrained our model using ResNet50 but the problem persists. So, I found a kinda hack to obtain the frozen graph (.pb file). This hack leaves the model inputs and outputs dims unclear, leading to further problems when converting inferencing ONNX model; yeah I also obtained the ONNX model.

Results:

  • Uff:

    precision time
    fp32 f
    fp16 252.326 ms/frame
    int8 149.858 ms/frame
  • ONNX: Not supported yet

PSEUDO-CODE

o_actuation o_actuation_test.py

  • Get live LiDAR values (via a thread updating globals via get_lidar_vals())

  • Calculate errorL and errorR for left and right pair of LiDARs (val_top_lidar - val_bottom_lidar)

  • Take their avg (avg_error) only if their difference (delta_LR) is lesser than a threshold (threshold_LR)

  • send abs(delta_LR) to min_max_normalizer() and compute acceleration. Use

    Upper Actuation: [0,63] #0: STOP Lower Actuation: [64,126] #64: STOP therefore, MIN_ACCELERATION: 0 MAX_ACCELERATION: 62

    and

    MIN_DELTA_LR: 0 MAX_DELTA_LR:

    and how to use Rescaling (min-max normalizer) (https://en.wikipedia.org/wiki/Feature_scaling)

  • Rescale sample x, from [MIN_DELTA_LR, MAX_DELTA_LR] to [MIN_ACCELERATION, MAX_ACCELERATION], i.e., acceleration = MIN_ACCELERATION + ((x-MIN_DELTA_LR)*(MAX_ACCELERATION-MIN_ACCELERATION)) / (MAX_DELTA_LR-MIN_DELTA_LR)

  • If delta_LR is +ve, opt Upper Actuation. Then, message=acceleration. Else, message=64+acceleration

  • Send message

v_actuation (v_actuation.py)

A note on actions needed when no actuations are needed and the conditions that lead to them:

There can only be 3 cases for this condition to arise: i. The arm is aligned & arm is over Panel (Panel Ends) # 0&1 ii. The arm is unaligned & arm is over Ground # 1&0 iii. The arm is aligned & arm is over Ground # 0&0

  • All cases are detected via flag FLAG_PNL_OR_GND (0/1), and avg_dist jumping outta [OPTIMAL_DISTANCE-BUFFER, OPTIMAL_DISTANCE+BUFFER] range

  • Occurence of these cases: Case i aka Case 6: This occurs in 2 conditions: (a) when the bot has entered the row and has completed the arm alignment but still immobile. Jetson waits for BBM to transmit if the arm (BBM+BBS) is aligned. OR (b) when the cleaning is in ON and the arm is perfectly aligned cum cleaning. Note that this also includes the instant when the panel is just getting over.

    • Case ii: This case is avoided around line 240 using FLAG_PNL_OR_GND.

    • Case iii: Intuitively, this case arises as soon as the cleaning is done as the row gets over, usually immediately after case i. This occurs very rarely because as soon as the lidars hit the ground, they tend to lose their alignment status, hence the arm is no longer aligned and this condition fails.

  • Handling of these cases:

    • Case i aka Case 6: In case (a), we are supposed to update the alignment status viz the FLAG_BBM_ALIGNED so that Jetson can begin navigation. (c makes sure we have a consistent reading and helps toggle the flag) For case (b), we don't perform a lot here, just wait for d(=0) to enter the check (near line 260) where FLAG_PANEL_END helps us know if reset is needed.

    • Case iii: This is handled just like case i.

row_entry

  • Jetson approaches the panel and launches server for BBM to connect to.
  • BBM (within v_actuation.py) connects to this server and starts sending Panel/Ground status to Jetson.
  • Jetson waits for MAX_CONSECUTIVE_ENTRIES (=20) before concluding arm is over panel.
  • The jetson program ends; BBM continues to align the arm and send the alignment status to Jetson via FLAG_ARM_ALIGNED.

Run back/left

  • Jetson re-establishes server. Meanwhile, BBM is performing alignment. Jetson program waits for alignment to finish (via FLAG_ARM_ALIGNED) before executing anything.

  • Program terminates after

    • bot has entered the row and moved > MIN_STEPS_TO_APPROACH_PANEL steps and detected GND for MAX_CONSECUTIVE_ENTRIES measurements.

    OR

    • mask detection error count exceeds threshold error_counts (= 40).

look forward move back (depreceated)

  • The module was needed when the 'run back' module terminated due to mask errors instead of: detecting panel end via lidar_dist. So, unlike the previous module, if it does not detect masks, instead of terminating, it moves straight. It only terminates when the panel gets over.

Arm reset

  • In v_actuation, its time for arm reset. BBM does this by confirming 2 conditions:

    • Did it actually sense GND or is it just some inter-panel gap or just the entry of the panel where the bot is yet to approach the panel for alignment. This is checked via count_did_arm_alignment_ever_occur.
    • Did the lidar sense GND consecutively ? This is done via c, d and FLAG_PANEL_END.
  • All flags are reset and BBM goes back to waiting for the arm to come over panel again (so that it can align it) via FLAG_PNL_OR_GND. But this isn't possible because at this time, Jetson is running 'run forward' which has no server running on port 6677 and this means, the function (client_BBB) can't update FLAG_PNL_OR_GND. But we do need lidar values to detect if we have travelled back to the starting of the row so that the bot can take a 90 deg turn and go back to another row or back to dock. So, we instead launch another connection; BBM and Jetson communicate PNL/GND on port 8899 😎.


Message Queue

Architecture:

We use ZeroMQ for our purpose.

The built-in core ZeroMQ patterns are:

  • Request-reply (Server-Client), which connects a set of clients to a set of services. This is a remote procedure call and task distribution pattern. Request for one or more client services, useful for implementing RPC mechanisms.
  • Pub-sub, which connects a set of publishers to a set of subscribers. This is a data distribution pattern. Distribution of data by publication, useful for notifying updates to an audience of recipients.
  • Pipeline (Push/Pull), which connects nodes in a fan-out/fan-in pattern that can have multiple steps and loops. This is a parallel task distribution and collection pattern, useful for propagating data updates.
  • Exclusive pair (Pair), which connects two sockets exclusively. This is a pattern for connecting two threads in a process, not to be confused with “normal” pairs of sockets. Exclusive connection between two threads, useful for implementing IPC mechanisms.
  • All LiDARs communicate with their hosts (BBx) in a fan-in/pipeline pattern.
  • Jetson-BBM and BBM-BBS communicate via Request-reply pattern.
  • Our implementation of Push-Pull architecture has been tested out in the files:
    • collector: Collects data from all agent(s), lidars in our case.
    • agent: Each agent simulates sensor that wants to push the data to a common server (Jetson in our case). We also plot the lidar data collected from all the agents/lidars and then plot it in real time.

Architecture-1

The intra-device communication is carried on the following architecture:

Architecture-2

Pending Features:

  • Add Request-reply to FLAG_ARM_ALIGNED, FLAG_ALIGNED, FLAG_PANEL_END, FLAG_PNL_OR_GND such that their senders can get acknowledgement post receiving.

Flag transactions per module

Row Entry

Used during row entry, run back/right.

Server Jetson @10.0.0.1
File pid_bada_chutki_row_entry.py
Port 6677
Flags in: FLAG_PNL_OR_GND=1 (GND by default)
in: FLAG_ARM_ALIGNED
out: TOGGLE_ACTUATIONS
Client BBM @10.0.0.3
Port 6677
File v_actuation.py
Module client_BBB
Flags out: FLAG_PNL_OR_GND=1,
out: [FLAG_BBS_ALIGNED = 0, FLAG_BBM_ALIGNED = 0] -> FLAG_ALIGNED = 4
out: FLAG_PANEL_END = 0
in: TOGGLE_ACTUATIONS

Row Exit (Panel end to column)

Used during run forward/left. It is assumed arm is over panel (end) and is the bot starts to travel back to the column and stops only when lidars detect GND.

Server Jetson @10.0.0.1
File pid_bada_chutki_row_entry.py
Port 8899
Flags in: lidar_dist=0 (PNL by default)
Client BBM @10.0.0.3
Port 8899
File v_actuation.py
Module client_BBB
Flags out: FLAG_PNL_OR_GND=1

Miscellaneous

Used repetitively

Server BBM @10.0.0.3
Port 5555
File v_actuation.py
Module BBM
Flags in: FLAG_BBS_LIVE=0 (reset after every iter)
in: FLAG_BBS_ALIGNED
out: FLAG_PNL_OR_GND=1 (GND)
out: FLAG_PANEL_END=0 (Not over)
Client BBS @10.0.0.2
Port 5555
File o_actuation.py
Module BBS
Flags out: FLAG_BBS_LIVE = 0 (Offline)
out: FLAG_BBS_ALIGNED=0 (Unaligned by default)
in: FLAG_PNL_OR_GND=1 (GND)
in: FLAG_PANEL_END=0 (Still Cleaning)

Data Packaging:

The images are sent to the inferencing server as follows:

# Client
import requests
import json
import cv2
import jsonpickle

addr = 'http://localhost:5000'
test_url = addr + '/test'

# prepare headers for http request
content_type = 'image/png'
headers = {'content-type': content_type}

# Read data from image
img = cv2.imread('mutty.png')

# OR

# from camera feed

# encode image as png
_, img_encoded = cv2.imencode('.png', img)
# send http request with image and receive response
response = requests.post(test_url, data=img_encoded.tostring(), headers=headers)

# decode response
poly_points = jsonpickle.decode(response.text)
if(len(poly_points) < 10):
  print("No mask detected")
  continue
else:
  //use mask



time
# Server

Mask Overlay

The usual method to do so is defind in the function color_splash(). The tensor server also has this functionality in-built and save the masks in the save_dir = field_results/maskdump/

def color_splash(image, mask):
    """Apply color splash effect.
    image: RGB image [height, width, 3]
    mask: instance segmentation mask [height, width, instance count]

    Returns result image.
    """
    # Make a grayscale copy of the image. The grayscale copy still
    # has 3 RGB channels, though.
    gray = skimage.color.gray2rgb(skimage.color.rgb2gray(image)) * 255
    # Copy color pixels from the original color image where mask is set
    #mask = (np.sum(mask, -1, keepdims=True) >= 1)
    if mask.shape[-1] > 0:
        # We're treating all instances as one, so collapse the mask into one layer
        mask = (np.sum(mask, -1, keepdims=True) >= 1)
        splash = np.where(mask, image, gray*0.5).astype(np.uint8)
    else:
        splash = gray.astype(np.uint8)
    return splash

Boost the clocks

sudo nvpmodel -m 0
sudo jetson_clocks

Error Codes

Code Error Description
99 Regular Error Errors exceeding some threshold currently within acceptable frequency. We usually simply continue until it leads to error 9992
9991 Mask not inferred Mainly due to panel absent in LoS
9992 Error count out of bound Too many outlier errors (> MAX_ERROR) usually due to distorted/noisy masks

References

Feducial Pose Estimation based Docking

Miscellaneous

Image Processing breakthroughs in Soltrek

MaskRCNN Related

BB related

NVIDIA DeepStream & MaskRCNN TLT

Messaging Queue Stack

Arm Related

Hardware Related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment