Swathi Jadav

Projects

Robotics, Deep Learning and Computer Vision Projects

MMPUG - Multi-Modal Perception Uber Good

This project is a heterogenous high-speed multi-robot system used to rapidly map, navigate, search and rescue in unstructured environments.
My Work is focused on development, testing, and deployment of SLAM and Perception framework on a heterogeneous fleet of robots comprising fast-moving RC cars and quadrupedal robot.
Project website

Autonomous Drone Exploration

Research Focused on Navigation and guidance for Autonomous Drone exploration advised by Prof. Sebastian Scherer.
Multi-view Wide-Angle State Estimation and Reconstruction for Autonomous Flight to estimate the relative motion and create 360 depth maps to enable a drone to fly autonomously.
Implemented ORB SLAM-3 visual odometry, for State Estimation and Reconstruction utilizing the drone’s Stereo fisheye multi-camera(6) feed.
Worked on Visual Odometry for navigation as part of the Autonomy stack for autonomous drones .

SuperVLOAM - Super Visual LiDAR Odometry and mapping

State of the art technique for localizing and mapping an environment in real time using both camera and LIDAR sensors by fusing visual features and LIDAR data to estimate robots movements with higher precision.
Implemented a robust real-time ROS-based framework for accurate trajectory estimation, 3D Mapping, and Localization by fusing stereo RGB image and LiDAR data using ICP in a non-linear optimization framework to achieve an ATE of 1.773m
VLOAM used feature extractors such as ShiTomasi, ORB, BRIEF, FAST etc. The descriptor matching was done using Brute-force, L2-Norm and FLANN based matching.
Augmented the feature extraction and matching algorithm with Super-Point descriptor and SuperGlue matching algorithm.

Perception in Snow-Covered Environment

Modeled a comprehensive perception module for Progressive LiDAR adaptation for road detection in adverse Snow-Covered conditions through Sensor Fusion of Camera and LiDAR data on KITTI and Canadian Adverse Driving Conditions (CADC) dataset.
This approach adapts LiDAR information into visual image-based road detection and improves detection performance.

Super Deep SORT - Simple Online Real-time tracking

Combined the current State of the art multi-object tracking algorithm - Deep SORT with SuperGlue algorithm to enhance the object tracking in presence of Occlusion.
Our implementation is robust to handle occlusion cases effectively and improves object tracking, matching and re-association of tracks by 2% copared to the baseline.
The algorithm uses YOLOv7 for object detection. The object estimation and object matching uses three cost metrics.
- SuperGlue Cost Metric
- Cosine Similarity Metric
- IOU cost metric using the kalman predicted bounding boxes
Using the cost matrix, the matched tracks are succesfully re-assigned previous identity and unmatched tracks are assigned new ID's.
This pipeline was extended to successfully track apples real time in an orchard - count and determine their location for a robot to perfectly pick them.

HEXA-Human Demo Augmented Explorer and Achiever

Extended the SOTA LEXA Benchmark (Latent Explorer and Achiever) [Baseline Paper] - a unified solution that learns a world model from image inputs and uses it to train an explorer and an achiever policy from imagined rollouts using human demonstrations. This project was advised by Prof. Deepak Pathak.
The aim of the project was to augment the curiosity based exploration and achieving LEXA agent using human telop based demonstrations to explore and learn low skills(sub-tasks) to perform long-horizon tasks.
This approach would help guide the exploration of LEXA around discovering goals that would be useful to perform the task carried out in the demonstration.We use the Franka Kitchen demonstration from D4RL to augment the exploration. We extended LEXA to long horizon tasks by extending exploration to start from different states.

DeepSpace

Led a team of 2 in the conceptualization and development of DeepSpace platform, a framework for adaptive real-time programming, control, monitoring, and task sequencing of a novel vision-enabled pick and place General Purpose Robotic Arm.
Built a point cloud preprocessing pipeline with Intel Realsense, PCL, and gRPC for filtering and downsampling point cloud data for real-time streaming.
Conceptualized and integrated the system architecture for sub-system communication and data storage.
Integrated real-time Point-Cloud streaming from RealSense Camera using gRPC framework for real-time Monitoring, programming, and Task Sequencing in Unity.
Wrote production-level, low-latency python and C# code for Communicating with ROS-based controller; extended existing SDK to support gRPC for a 6 DOF UFactory X-Arm.

Weakly Supervised Deep Detection Network

The project exploits the use of pre-trained CNN’s for Weakly supervised deep detection of image regions, performing simultaneous region selection and classification without using image-level annotations.
A novel end-to-end method that classifies and predicts bounding boxes using AlexNet on PASCAL VOC data through the use of spatial pyramid pooling.
This project is based on the following papers. [Paper 1] and [Paper 2]

Monte Carlo Localization - Particle Filter for Robot Localization

Implemented a Particle filter to localize a robot in an indoor environment.
Implemented motion model using odometer readings, sensor model using the LiDar Scan readings and ray tracing algorithm.

Visual Question Answering

Implmented a Multi-Modal Visual Question Answering model using pre-trained RoBERTa and TransformerNet architecture.
Incorporated self-attention and cross-attention to aid interactions between textual and visual features.
Achieved an accuracy of 67.62%

FACE CLASSIFICATION AND VERIFICATION USING CNN'S

Implemented ResNet-34 and ResNet-50 from scratch for classification of VGGFace2 dataset.Utilized Triplet loss to increase the performance of face recognition.
Implemented ConvNext-Tiny architecture for classification and verification.
Experimented with Ensemble Methods to improve the accuracy. Achieved an Accuracy of 94.5%.
Fine-tuned the model with triplet and center loss.

Speech Steganography Using Deep Learning

Implemented a Deep learning network which explored the use of neural networks as steganographic functions for speech data.
The objective was to conceal multiple messages in a single carrier using multiple decoders or a single conditional decoder such that it is unnoticeable by human listeners and that the decoded messages are highly intelligible.

Learning Strategies for Unsupervised Adaptation on Test Set

Research project on shift estimation of alignments in the test and train dataset to reduce the distribution mismatch which leads to poorer classification.
Proposes an affine transformation to help learn better fit to the test data
Learning strategies compared against Audio Speech Transformer (AST) model as baseline using Audioset dataset.

ATTENTION-BASED END-TO-END SPEECH-TO-TEXT DEEP NEURAL NETWORK

Implemented a multi-head attention based Speech to Text Deep Neural Network using ”Listen, Attend and Spell” paper as the baseline model.
Pre-processed speech data and transcripts for neural network input and designed depthwise convolution layer for feature extraction and embedding layers.
Results: Levenshtein distance (8.7); reached an A score in the Kaggle competition.

UTTERANCE TO PHONEME MAPPING USING RNN'S

PHONEME LEVEL CLASSIFICATION OF SPEECH

Simulator Projects

Virtual Reality Based Airport Rescue and Fire-Fighting Simulator

Developed and Deployed a Virtual Reality based Simulator at an airport to deliver realistic ARFF operation training by simulating vehicle behavior, accident prone scenarios and authentic controls within three months.
The objectives of this project was to reducing both safety risks and problems caused due to ground operations in the airside.
The Simulator dualed as a Airside Driving Permit test simulator as a pre-test to the airside drivers before getting a driver permit. This test is conducted every six-months as a recap of airside driving rules.
The simulator includes simulation of radio management and communication with Air Traffic Control and all procedures so that the trainee can make the right decisions in specific situations such as a Fire hazard.
The trainee drives on a Motion Platform which makes him feel and react exactly as he would do in real life. The trainee is tested on several operational scenarios under an instructor’s supervision.
The Simulator was developed using HTC VIVE Device to provide a Virtual Scene of the entire airport. Leap Motion device was used for real hand tracking of the trainee.
A 3 DOF Motion platform along with Logitech steering and Gear system were used to control the driving.
User Inputs and Data Acquisition system were developed and integrated with the Unity 3D application to simulate the real cabin controls .

Automotive Bus Driving Simulator

Developed a 3DOF motion-based automotive simulator for training and familiarization of driving heavy vehicles which incorporates traffic incidents and vehicle malfunctions.
contributions involved interfacing the actual steering and gear control feedback systems, traffic incidents and vehicle malfunctions modules.

Monorail Locomotive and Cockpit Training Simulator

Design and Development of a Monorail replica training station to produce an accurate simulation of real traffic conditions, protection systems, cab signaling, breakdown protocol training, eco driving etc. Implemented Cockpit controls, traction/brake, hydraulic, HMIs, door operation training modules.

Augmented Reality based Emergency Medical Training Simulation

This work presents a AR based medical training prototype designed to train medical practitioner’s on performing Endotracheal Intubations also known as Laryngoscopy.
The system accomplishes this task with the help of AR simulation. The system will allow paramedics, pre-hospital personnel, and students to practice their skills without touching a real patient and will provide them with the visual feedback they could not otherwise obtain.
This simulator was developed in an effort to improve airway management training and respiratory system prognostics Utilizing a 3D human dummy model with detailed internal organs of the upper thorax combined with 3D AR visualization of the airway anatomy and three kinds of Laryngoscopes, paramedics will be able to obtain a visual and tactile sense of proper ETI procedure.
The system consists of a Magic Leap Head Mount Device (HMD) where the trainee sees a virtual 3D model of the human dummy and the User Interface superimposed onto the real world as shown below.