Imagine interacting with a robot simply by drawing on a touchpad. Our system leverages the power of intrinsic tactile sensing - using data directly from the robot's joint torque sensors to understand your input. As you draw a digit on the touchpad, the robot detects subtle changes in the forces and moments at its joints. This data then becomes the basis for real-time digit recognition. This eliminates the need for external sensors, reducing complexity and cost.
To build a robust and intuitive system, we developed a standard digit drawing protocol, as shown in the video below. While the initial dataset was collected using this protocol, our goal was to create a system capable of recognizing digits even when drawn in a reversed manner (e.g., starting the digit "2" from right to left instead of left to right) or rotated by a certain angle. To achieve this, we implemented a data augmentation technique. By applying transformations to our initial dataset collected from the standard protocol, we generated synthetic data representing reversed and rotated digits, significantly improving the system's adaptability to diverse user inputs.
Our system uses a Bidirectional Long Short-Term Memory (Bi-LSTM) network to classify the complex time-series data generated by digit drawing. This approach effectively captures the spatio-temporal relationships within the data. The result? High-accuracy, real-time digit classification. This network achieves an overall accuracy of 94% across various test scenarios, including those involving users who did not participate in the training phase, and with the robot in poses different than the one in which training data was recorded.
Even with high-accuracy AI models, potential misclassifications can always occur. Therefore, our system employs a Hierarchical Finite State Machine (HFSM) to manage task execution and ensure safe interaction. The robot interprets the drawn digit and uses synthesized speech to audibly confirm the recognized command with the user, mitigating the risk of unintended actions. Confirmation is then received through a simple tap on the robot's arm. This simple mechanism is an intuitive interface for the human and is very easy to interpret by humans. This multimodal approach (touch + voice) leverages the naturalness of touch and voice communication, adding a layer of safety and user control.
To demonstrate the practicality of our system, we implemented a fruit delivery task. Users simply draw the digit corresponding to their desired fruit, and the robot autonomously retrieves and presents the item. This showcases the potential of intrinsic tactile sensing for intuitive and accessible human-robot collaboration in everyday scenarios.