AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (2024)

Max Yang1,  Chenghua Lu1,  Alex Church2,  Yijiong Lin1,  Chris Ford1,  Haoran Li1,
Efi Psomopoulou1,  David A.W. Barton1*, Nathan F. Lepora1*
1Univerisity of Bristol   2Cambrian Robotics
https://maxyang27896.github.io/anyrotate/

Abstract

Human hands are capable of in-hand manipulation in the presence of different hand motions. For a robot hand, harnessing rich tactile information to achieve this level of dexterity still remains a significant challenge. In this paper, we present AnyRotate, a system for gravity-invariant multi-axis in-hand object rotation using dense featured sim-to-real touch. We tackle this problem by training a dense tactile policy in simulation and present a sim-to-real method for rich tactile sensing to achieve zero-shot policy transfer. Our formulation allows the training of a unified policy to rotate unseen objects about arbitrary rotation axes in any hand direction. In our experiments, we highlight the benefit of capturing detailed contact information when handling objects with varying properties. Interestingly, despite not having explicit slip detection, we found rich multi-fingered tactile sensing can implicitly detect object movement within grasp and provide a reactive behavior that improves the robustness of the policy.

footnotetext: * These authors contributed equally.footnotetext: Correspondence to max.yang@bristol.ac.uk.
AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (1)

Keywords: Tactile Sensing, In-hand Object Rotation, Reinforcement Learning

1 Introduction

The versatility of manipulating objects of varying shapes and sizes has been a long-standing goal for robot manipulation[1]. However, in-hand manipulation with multi-fingered hands can be hugely challenging due to the high degree of actuation, fine motor control, and large environment uncertainties. While significant advances have been made in recent years, most prominently the work by OpenAI[2, 3], they have relied primarily on vision-based systems which are not necessarily well suited to this task due to significant self-occlusion. Overcoming these issues often requires multiple cameras and complicated setups that are not representative of natural embodiment.

More recently, researchers have begun to explore the object rotation problem with proprioception and touch sensing[4, 5], treating it as a representative task of general in-hand manipulation. The ability to rotate objects around any chosen axis in any hand orientation displays a useful set of primitives that is crucial for manipulating objects freely in space, even while the hand is in motion. However, this can be challenging as it requires comprehension of the complex interaction physics that cannot be visually observed coupled with high-precision control of secure grasps in the presence of gravity (i.e. gravity invariant): it is harder to hold an object while manipulating it if the palm is not facing upwards, which has been the scenario in related work[2, 3, 4, 5]. Tactile sensing is expected to play a key role here as it enables the capture of detailed contact information to better control the robot-object interaction. However, rich tactile sensing for in-hand dexterous manipulation has not yet been fully exploited due to the large sim-to-real gap, often leading to a reduction of high-resolution tactile data to low-dimensional representations [6, 7]. One might expect that a more detailed tactile representation could increase in-hand dexterity and enable new tasks.

In this paper, we introduce AnyRotate: a robot system for performing multi-axis gravity-invariant in-hand object rotation with dense featured sim-to-real touch. Here, we propose to tackle this task with goal-conditioned RL and rich tactile sensing. We first present our sim-to-real formulation and dense tactile representation to train an accurate and precise policy for multi-axis object rotation. We then train an observation model to simultaneously predict continuous contact pose and contact force readings from tactile images to capture important features for precise manipulation under noisy conditions. In the real world, we mount tactile sensors onto the fingertips of a four-fingered fully-actuated robot hand to provide rich tactile feedback for performing stable in-hand object rotation.

Our principal contributions are, in summary:
1) An RL formulation using auxiliary goals for end-to-end learning of a unified policy, capable of achieving multi-axis in-hand object rotation in arbitrary hand orientation relative to gravity.
2) A dense tactile representation for learning in-hand manipulation. We highlight the benefit of acquiring detailed contact information for handling unseen objects with various physical properties.
3) We achieve zero-shot sim-to-real policy transfer and validate on 10 diverse objects in the real world. Our policy demonstrates strong robustness across various hand directions and rotation axes, and maintains high performance even when deployed on a rotating hand.

2 Related Work

Classical Control. Due to the complexity of the contact physics in dexterous manipulation, work on this topic has traditionally relied on planning and control using simplified models[8, 9, 10, 11, 12, 13, 14, 15]. With improved hardware and design, these methods have continued to demonstrate an increased level of dexterity[16, 17, 18, 19, 20, 21, 22, 23, 24]. While these methods offer performance guarantees, they are often limited by the underlying assumptions.

Dexterous Manipulation. With advances of machine learning, learning-based control has become a popular approach to achieving dexterity [3, 2, 25, 26]. However, most prior works rely on vision as the primary sense for object manipulation[27, 28, 29, 30], which requires continuous tracking of the object in a highly dynamic scene where occlusions could lead to poorer performance. Vision also has difficulty capturing local contact information which may be crucial for contact-rich tasks.

More recently, researchers have explored the in-hand object rotation task using proprioception and touch sensing[4, 7, 5]. This has so far been limited to rotation about the primary axes or training separate policies for arbitrary rotation axes with an upward facing hand[31, 6]. In-hand manipulation in different hand orientations can be challenging as the hand must perform finger-gaiting while keeping the object stable against gravity. Several works[32, 33, 34, 35] achieved manipulation with a downward-facing hand using either a gravity curriculum or precision grasp manipulation, but the policies were still limited to a single hand orientation. In this work, we make significant advancements to train a unified policy to rotate objects about any chosen rotation axes in any hand direction, and for the first time achieve in-hand manipulation with a continuously moving and rotating hand.

Tactile Sensing. Vision-based tactile sensors have become increasingly popular due to their affordability, compactness, and ability to provide precise and detailed spatial information about contact through high-resolution tactile images[36, 37, 38, 39]. However, this fine-grained local contact information has not yet been fully utilized for in-hand dexterous manipulation. Previous studies have reduced the high-resolution tactile images to binary contact[7] or discretized contact location[6] to reduce the sim-to-real gap. In contrast, our system utilizes a dense featured tactile representation consisting of the full contact pose and contact force. We show that this tactile representation can capture important interaction physics that is valuable for dexterous manipulation under unknown disturbances, outperforming prior baselines.

Sim-to-real Methods: Learning in simulation for tactile robotics has gained appeal as it avoids the practical limitations of large data collection in real-world interactions. This trend has been driven by advancements in high-fidelity tactile simulators[40, 41, 42, 43], as well as various sim-to-real approaches[44, 45, 46, 47, 48]. Several works have proposed using high-frequency rendering of tactile images for sim-to-real RL[44, 49, 50]. However, this can be computationally expensive and inefficient, limiting these methods to simpler robotic systems. In this work, we extend the sim-to-real framework of Yang et al.[51] by proposing an approach to predict full contact pose and contact force and apply it to a dexterous manipulation task with a robot hand.

3 Method

We perform in-hand object rotation via stable precision grasping, constituting a process for continuous object rotation without a supporting surface. Gravity invariance is considered by randomly initializing hand orientations between episodes. This is a difficult exploration problem whereby the movement of the fingers and object has a constant requirement of maintaining stability, as any finger misplacement can induce slip and lead to an irreversible state where the object is dropped. To obtain a general policy for multi-axis in-hand object rotation, we formulate the object-rotation problem as object reorientation and adopt a two-stage learning process. First, the teacher is trained with privileged information and reinforcement learning[52]. We use an auxiliary goal formulation and adaptive curriculum to achieve sample-efficient training. The student is then trained via supervised learning to imitate the teacher’s policy given only real-world observations. During both stages, we provide the agent with rich tactile feedback. To bridge the sim-to-real gap for rich tactile sensing, we collect contact data to train an observation model, which allows for zero-shot policy transfer to the real world. An overview of the method is shown in Figure2 and 3.

3.1 Multi-axis In-hand Object Rotation

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (2)

We formulate the task as a finite horizon goal-conditioned Markov Decision Process (MDP) =(𝒮,𝒜,,𝒫,𝒢)𝒮𝒜𝒫𝒢\mathcal{M}=(\mathcal{S},\mathcal{A},\mathcal{R},\mathcal{P},\mathcal{G})caligraphic_M = ( caligraphic_S , caligraphic_A , caligraphic_R , caligraphic_P , caligraphic_G ), defined by a continuous state s𝒮𝑠𝒮s\in\mathcal{S}italic_s ∈ caligraphic_S, a continuous action space a𝒜𝑎𝒜a\in\mathcal{A}italic_a ∈ caligraphic_A, a probabilistic state transition function p(st+1|st,at)𝒫𝑝conditionalsubscript𝑠𝑡1subscript𝑠𝑡subscript𝑎𝑡𝒫p(s_{t+1}|s_{t},a_{t})\in\mathcal{P}italic_p ( italic_s start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) ∈ caligraphic_P, a goal g𝒢𝑔𝒢g\in\mathcal{G}italic_g ∈ caligraphic_G and a reward function r:𝒮×𝒜×𝒢:𝑟absent𝒮𝒜𝒢r\in\mathcal{R}:\mathcal{S}\times\mathcal{A}\times\mathcal{G}\xrightarrow{}%\mathbb{R}italic_r ∈ caligraphic_R : caligraphic_S × caligraphic_A × caligraphic_G start_ARROW start_OVERACCENT end_OVERACCENT → end_ARROW blackboard_R. At each time step t𝑡titalic_t, a learning agent selects an action atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT from the current policy π(at|st,g)𝜋conditionalsubscript𝑎𝑡subscript𝑠𝑡𝑔\pi(a_{t}|s_{t},g)italic_π ( italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT | italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_g ) and receives a reward r𝑟ritalic_r. The aim is to obtain a policy πθsuperscriptsubscript𝜋𝜃\pi_{\theta}^{*}italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT start_POSTSUPERSCRIPT ∗ end_POSTSUPERSCRIPT parameterized by θ𝜃\thetaitalic_θ that maximizes the expected return 𝔼τpπ(τ),gq(g)[t=0Tγtr(st,at,g)]subscript𝔼formulae-sequencesimilar-to𝜏subscript𝑝𝜋𝜏similar-to𝑔𝑞𝑔delimited-[]superscriptsubscript𝑡0𝑇superscript𝛾𝑡𝑟subscript𝑠𝑡subscript𝑎𝑡𝑔\mathbb{E}_{\tau\sim p_{\pi}(\tau),\,g\sim q(g)}\left[\sum_{t=0}^{T}\gamma^{t}%r(s_{t},a_{t},g)\right]blackboard_E start_POSTSUBSCRIPT italic_τ ∼ italic_p start_POSTSUBSCRIPT italic_π end_POSTSUBSCRIPT ( italic_τ ) , italic_g ∼ italic_q ( italic_g ) end_POSTSUBSCRIPT [ ∑ start_POSTSUBSCRIPT italic_t = 0 end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT italic_γ start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT italic_r ( italic_s start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_g ) ] over an episode τ𝜏\tauitalic_τ.

Observations. The observation Otsubscript𝑂𝑡O_{t}italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT contains the current and target joint position qt,q¯t16subscript𝑞𝑡subscript¯𝑞𝑡superscript16q_{t},\bar{q}_{t}\in\mathbb{R}^{16}italic_q start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , over¯ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT, previous action at116subscript𝑎𝑡1superscript16a_{t-1}\in\mathbb{R}^{16}italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT, fingertip position ftp12subscriptsuperscript𝑓𝑝𝑡superscript12f^{p}_{t}\in\mathbb{R}^{12}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 12 end_POSTSUPERSCRIPT, fingertip orientation ftr16subscriptsuperscript𝑓𝑟𝑡superscript16f^{r}_{t}\in\mathbb{R}^{16}italic_f start_POSTSUPERSCRIPT italic_r end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT, binary contact ct{0,1}4subscript𝑐𝑡superscript014c_{t}\in\{0,1\}^{4}italic_c start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ { 0 , 1 } start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, contact pose Pt𝕊4subscript𝑃𝑡superscript𝕊4P_{t}\in\mathbb{S}^{4}italic_P start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_S start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, contact force magnitude Ft4subscript𝐹𝑡superscript4F_{t}\in\mathbb{R}^{4}italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ∈ blackboard_R start_POSTSUPERSCRIPT 4 end_POSTSUPERSCRIPT, and the desired rotation axis k^𝕊2^𝑘superscript𝕊2\hat{k}\in\mathbb{S}^{2}over^ start_ARG italic_k end_ARG ∈ blackboard_S start_POSTSUPERSCRIPT 2 end_POSTSUPERSCRIPT. The privileged information provided to the teacher includes object position, orientation, angular velocity, dimensions, gravity force vector, and the current goal orientation.

Action Space. At each time step, the action output from the policy is at:=Δθ16assignsubscript𝑎𝑡Δ𝜃superscript16a_{t}:=\Delta\theta\in\mathbb{R}^{16}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT := roman_Δ italic_θ ∈ blackboard_R start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT, the relative joint positions of the robot hand. To encourage smooth finger motion, we apply an exponential moving average to compute the target joint positions defined as q¯t=q¯t1+a~tsubscript¯𝑞𝑡subscript¯𝑞𝑡1subscript~𝑎𝑡\bar{q}_{t}=\bar{q}_{t-1}+\tilde{a}_{t}over¯ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = over¯ start_ARG italic_q end_ARG start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT + over~ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where a~t=ηat+(1η)at1subscript~𝑎𝑡𝜂subscript𝑎𝑡1𝜂subscript𝑎𝑡1\tilde{a}_{t}=\eta a_{t}+(1-\eta)a_{t-1}over~ start_ARG italic_a end_ARG start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_η italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_η ) italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT. We control the hand at 20 Hz and limit the action to Δθ[0.026,0.026]16Δ𝜃superscript0.0260.02616\Delta\theta\in[-0.026,0.026]^{16}roman_Δ italic_θ ∈ [ - 0.026 , 0.026 ] start_POSTSUPERSCRIPT 16 end_POSTSUPERSCRIPT rad.

Simulated Touch. We approximate the sensor as a rigid body and fetch the contact information from its sensing surface; the local contact position (cx,cy,czsubscript𝑐𝑥subscript𝑐𝑦subscript𝑐𝑧c_{x},c_{y},c_{z}italic_c start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT) for computing contact pose, and the net contact force (Fx,Fy,Fz)subscript𝐹𝑥subscript𝐹𝑦subscript𝐹𝑧(F_{x},F_{y},F_{z})( italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT ) for computing contact force magnitude. We apply an exponential moving average on the contact force readings to simulate sensing delay due to elastic deformation. We also saturate and re-scale the contact values to the sensing ranges experienced in reality. Contact force is used to compute binary contact signals using a threshold similar to the real sensor.

Auxiliary Goal. For training a unified policy for multi-axis rotation, a formulation using angular velocity will lead to inefficient training and convergence difficulties, as will be shown in Section5.1. Instead, we formulate the problem as object reorientation to a moving target. Targets are generated by rotating the current object orientation about the desired rotation axis in regular intervals. When a target is reached, a new one is generated about the rotation axis until the episode ends.

Reward Design. In the following, we provide an intuitive explanation of the goal-based reward used for learning multi-axis object rotation (with full details in AppendixB):

r=rrotation+rcontact+rstable+rterminate𝑟subscript𝑟rotationsubscript𝑟contactsubscript𝑟stablesubscript𝑟terminater=r_{\rm rotation}+r_{\rm contact}+r_{\rm stable}+r_{\rm terminate}\\italic_r = italic_r start_POSTSUBSCRIPT roman_rotation end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_contact end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_terminate end_POSTSUBSCRIPT(1)

The object rotation objective is defined by rrotationsubscript𝑟rotationr_{\rm rotation}italic_r start_POSTSUBSCRIPT roman_rotation end_POSTSUBSCRIPT. We use a keypoint formulation 𝒦(kiokig)𝒦normsubscriptsuperscript𝑘o𝑖subscriptsuperscript𝑘g𝑖\mathcal{K}(||k^{\rm o}_{i}-k^{\rm g}_{i}||)caligraphic_K ( | | italic_k start_POSTSUPERSCRIPT roman_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_k start_POSTSUPERSCRIPT roman_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | | ) to define target poses[29] and apply keypoint distance threshold to provide a goal update tolerance dtolsubscript𝑑told_{\rm tol}italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT. We augment this reward with a sparse bonus reward when a goal is reached and a delta rotation reward to encourage continuous rotation. Next, we use rcontactsubscript𝑟contactr_{\rm contact}italic_r start_POSTSUBSCRIPT roman_contact end_POSTSUBSCRIPT to maximize contact sensing which rewards tip contacts and penalizes contacts with any other parts of the hand. We also include several terms to encourage stable rotations rstablesubscript𝑟stabler_{\rm stable}italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT comprising: an object angular velocity penalty; a hand-pose penalty on the distance between the joint position from a canonical pose; a controller work-done penalty; and a controller torque penalty. Finally, we include an early termination penalty rterminatesubscript𝑟terminater_{\rm terminate}italic_r start_POSTSUBSCRIPT roman_terminate end_POSTSUBSCRIPT, if the object falls out of the grasp or the rotation axis deviates too far from the desired axis.

Adaptive Curriculum. The precision-grasp object rotation task can be separated into two key phases of learning: first to stably grasp the object in different hand orientations, then to rotate objects stably about the desired rotation axis. Whilst the rcontactsubscript𝑟contactr_{\rm contact}italic_r start_POSTSUBSCRIPT roman_contact end_POSTSUBSCRIPT and rstablesubscript𝑟stabler_{\rm stable}italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT reward terms are beneficial for the sim-to-real transfer of the final policy, these terms can hinder the learning process, resulting in local optima where the object will be stably grasped without being rotated. To alleviate this issue, we apply a reward curriculum coefficient λrew(rcontact+rstable)subscript𝜆rewsubscript𝑟contactsubscript𝑟stable\lambda_{\rm rew}(r_{\rm contact}+r_{\rm stable})italic_λ start_POSTSUBSCRIPT roman_rew end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT roman_contact end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT ), which increases linearly with the average number of rotations achieved per episode.

3.2 Teacher-Student Policy Distillation

The training in Section 3.1 uses privileged information, such as object properties and auxiliary goal pose. Similar to previous work[5, 6], we use policy distillation to train a student that only relies on proprioception and tactile feedback. The student policy has the same actor-critic architecture as the teacher policy at=πθ(Ot,at1,zt)subscript𝑎𝑡subscript𝜋𝜃subscript𝑂𝑡subscript𝑎𝑡1subscript𝑧𝑡a_{t}=\pi_{\theta}(O_{t},a_{t-1},z_{t})italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_π start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ) and returns a Gaussian distribution with diagonal covariances at𝒩(μθ,Σθ)subscript𝑎𝑡𝒩subscript𝜇𝜃subscriptΣ𝜃a_{t}\equiv\mathcal{N}(\mu_{\theta},\Sigma_{\theta})italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ≡ caligraphic_N ( italic_μ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT , roman_Σ start_POSTSUBSCRIPT italic_θ end_POSTSUBSCRIPT ). The latent vector zt=ϕ(Ot,Ot1,,Otn)subscript𝑧𝑡italic-ϕsubscript𝑂𝑡subscript𝑂𝑡1subscript𝑂𝑡𝑛z_{t}=\phi(O_{t},O_{t-1},...,O_{t-n})italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT = italic_ϕ ( italic_O start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT , italic_O start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT , … , italic_O start_POSTSUBSCRIPT italic_t - italic_n end_POSTSUBSCRIPT ) is the predicted low dimensional encoding from a sequence of N𝑁Nitalic_N proprioceptive and tactile observations. We use a temporal convolutional network (TCN) encoder for the latent vector function ϕ(.)\phi(.)italic_ϕ ( . ).

Training. The student encoder is randomly initialized and the policy network is initialized with the weights from the teacher policy. We train both the encoder and policy network via supervised learning, minimizing the mean squared error (MSE) of the latent vectors ztsubscript𝑧𝑡z_{t}italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and zt¯¯subscript𝑧𝑡\bar{z_{t}}over¯ start_ARG italic_z start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG and negative log-likelihood loss (NLL) of the action distributions atsubscript𝑎𝑡a_{t}italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and at¯¯subscript𝑎𝑡\bar{a_{t}}over¯ start_ARG italic_a start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT end_ARG. Without explicit object or goal information, we found the student policy unable to achieve the same level of goal-reaching accuracy as the teacher, which can lead to missing the goal and collecting out-of-distribution data. To alleviate this issue, we increase the goal update tolerance dtolsubscript𝑑told_{\rm tol}italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT during student training.

3.3 Dense Featured Touch

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (3)

The dense tactile observations consist of contact pose and contact force. We use spherical coordinates defined by the contact pose variables: polar angle Rxsubscript𝑅𝑥R_{x}italic_R start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT and azimuthal angle Rysubscript𝑅𝑦R_{y}italic_R start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT. The contact force variable is the magnitude of the 3D contact force Fnorm𝐹||F||| | italic_F | |. For sim-to-real transfer, we train an observation model to extract features from tactile images and perform zero-shot policy transfer[51].

Data Collection. We use a 6-DoF UR5 robot arm with the tactile sensor attached to the end effector and a F/T sensor placed on the workspace platform. The tactile sensor is moved on the surface of the flat stimulus mounted above the F/T sensor at randomly sampled poses. For each interaction, we store tactile images with the corresponding pose and force labels. We then train a CNN model to extract these explicit features of contact from tactile images. More details are given in AppendixH.

Deployment. We apply a Structured Similarity Index (SSIM) threshold between the current and the reference tactile image to compute binary contact, which is also used to mask contact pose and force predictions. Given tactile images on each fingertip, we use the observation models to obtain the dense contact features which are then used as tactile observations for the policy. An overview of the tactile prediction pipeline is shown in Figure3.

4 System Setup

Real-world. We use a 16-DoF Allegro Hand with finger-like front-facing vision-based tactile sensors attached to each of its fingertips. Each sensor can be streamed asynchronously along with the joint positions from the hand. The target joint commands are sent with a control rate of 20 Hz. The hand is attached to the end effector of a UR5 to provide different hand orientations for performing in-hand object rotation, as shown in Figure1.

Simulation. We use IsaacGym [53] for training the teacher and student policies. Each environment contains a simulated Allegro Hand with tactile sensors attached on each fingertip. Gravity is enabled for both the hand and the object. We perform system identification on simulation parameters in various hand directions to reduce the sim-to-real gap (detailed in AppendixE). We run the simulation at dt=1/60s𝑑𝑡160𝑠dt=1/60sitalic_d italic_t = 1 / 60 italic_s and policy control at 20 Hz.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (4)

Object Set. We use fundamental geometric shapes in Isaac Gym (capsule and box) for training. In simulation, we test on two out-of-distribution (OOD) object sets (see Figure4): 1) OOD Mass, training objects with heavier mass; 2) OOD shape, selection of unseen objects with different shapes. In the real world, we select 10 objects with different properties (see Table8) to test generalizability of the policy.

Evaluation We run each experiment for 600 steps (equating to 30 seconds) and use the following metrics for evaluation:
(i) Rotation Count (Rot) - the total number of rotations about the desired axis achieved per episode. In the real world, this is manually counted using reference markers attached to the object (visible as the tape in Figure4).
(ii) Time to Terminate (TTT) - time is taken before the object gets stuck, falls out of grasp, or if the rotation axis has deviated away from the target.

5 Experiments and Analysis

First we investigate our auxiliary goal formulation and adaptive curriculum for learning the multi-axis object rotation task (Section 5.1). Then we study the importance of rich tactile sensing for learning this dexterous manipulation task and conduct a quantitative analysis on the generalizability of the learned polices (Section 5.2). Finally, using the proposed sim-to-real approach, we deploy the policies in the real world on a range of different object rotation tasks (Section 5.3).

ObservationOOD MassOOD ShapeRotEpLen(s)RotEpLen(s)Fixed Hand Orn0.55±0.06subscript0.55plus-or-minus0.060.55_{\pm 0.06}0.55 start_POSTSUBSCRIPT ± 0.06 end_POSTSUBSCRIPT11.8±0.2subscript11.8plus-or-minus0.211.8_{\pm 0.2}11.8 start_POSTSUBSCRIPT ± 0.2 end_POSTSUBSCRIPT0.55±0.04subscript0.55plus-or-minus0.040.55_{\pm 0.04}0.55 start_POSTSUBSCRIPT ± 0.04 end_POSTSUBSCRIPT19.1±0.5subscript19.1plus-or-minus0.519.1_{\pm 0.5}19.1 start_POSTSUBSCRIPT ± 0.5 end_POSTSUBSCRIPTProprio1.34±0.07subscript1.34plus-or-minus0.071.34_{\pm 0.07}1.34 start_POSTSUBSCRIPT ± 0.07 end_POSTSUBSCRIPT21.5±0.5subscript21.5plus-or-minus0.521.5_{\pm 0.5}21.5 start_POSTSUBSCRIPT ± 0.5 end_POSTSUBSCRIPT0.82±0.02subscript0.82plus-or-minus0.020.82_{\pm 0.02}0.82 start_POSTSUBSCRIPT ± 0.02 end_POSTSUBSCRIPT25.1±0.3subscript25.1plus-or-minus0.325.1_{\pm 0.3}25.1 start_POSTSUBSCRIPT ± 0.3 end_POSTSUBSCRIPTBinary Touch1.90±0.04subscript1.90plus-or-minus0.041.90_{\pm 0.04}1.90 start_POSTSUBSCRIPT ± 0.04 end_POSTSUBSCRIPT20.8±0.5subscript20.8plus-or-minus0.520.8_{\pm 0.5}20.8 start_POSTSUBSCRIPT ± 0.5 end_POSTSUBSCRIPT1.57±0.05subscript1.57plus-or-minus0.051.57_{\pm 0.05}1.57 start_POSTSUBSCRIPT ± 0.05 end_POSTSUBSCRIPT25.3±0.2subscript25.3plus-or-minus0.225.3_{\pm 0.2}25.3 start_POSTSUBSCRIPT ± 0.2 end_POSTSUBSCRIPTDiscrete Touch1.95±0.15subscript1.95plus-or-minus0.151.95_{\pm 0.15}1.95 start_POSTSUBSCRIPT ± 0.15 end_POSTSUBSCRIPT22.2±0.4subscript22.2plus-or-minus0.422.2_{\pm 0.4}22.2 start_POSTSUBSCRIPT ± 0.4 end_POSTSUBSCRIPT1.67±0.08subscript1.67plus-or-minus0.081.67_{\pm 0.08}1.67 start_POSTSUBSCRIPT ± 0.08 end_POSTSUBSCRIPT26.5±0.1subscript26.5plus-or-minus0.126.5_{\pm 0.1}26.5 start_POSTSUBSCRIPT ± 0.1 end_POSTSUBSCRIPTDense Touch w/o Pose2.05±0.04subscript2.05plus-or-minus0.042.05_{\pm 0.04}2.05 start_POSTSUBSCRIPT ± 0.04 end_POSTSUBSCRIPT22.0±0.8subscript22.0plus-or-minus0.822.0_{\pm 0.8}22.0 start_POSTSUBSCRIPT ± 0.8 end_POSTSUBSCRIPT1.60±0.02subscript1.60plus-or-minus0.021.60_{\pm 0.02}1.60 start_POSTSUBSCRIPT ± 0.02 end_POSTSUBSCRIPT25.5±0.4subscript25.5plus-or-minus0.425.5_{\pm 0.4}25.5 start_POSTSUBSCRIPT ± 0.4 end_POSTSUBSCRIPTDense Touch w/o Force2.05±0.05subscript2.05plus-or-minus0.052.05_{\pm 0.05}2.05 start_POSTSUBSCRIPT ± 0.05 end_POSTSUBSCRIPT21.9±0.1subscript21.9plus-or-minus0.121.9_{\pm 0.1}21.9 start_POSTSUBSCRIPT ± 0.1 end_POSTSUBSCRIPT1.73±0.03subscript1.73plus-or-minus0.031.73_{\pm 0.03}1.73 start_POSTSUBSCRIPT ± 0.03 end_POSTSUBSCRIPT26.7±0.0subscript26.7plus-or-minus0.026.7_{\pm 0.0}26.7 start_POSTSUBSCRIPT ± 0.0 end_POSTSUBSCRIPTDense Touch2.18±0.05subscript2.18plus-or-minus0.05\boldsymbol{2.18_{\pm 0.05}}bold_2.18 start_POSTSUBSCRIPT bold_± bold_0.05 end_POSTSUBSCRIPT22.8±0.8subscript22.8plus-or-minus0.8\boldsymbol{22.8_{\pm 0.8}}bold_22.8 start_POSTSUBSCRIPT bold_± bold_0.8 end_POSTSUBSCRIPT1.77±0.01subscript1.77plus-or-minus0.01\boldsymbol{1.77_{\pm 0.01}}bold_1.77 start_POSTSUBSCRIPT bold_± bold_0.01 end_POSTSUBSCRIPT27.2±0.3subscript27.2plus-or-minus0.3\boldsymbol{27.2_{\pm 0.3}}bold_27.2 start_POSTSUBSCRIPT bold_± bold_0.3 end_POSTSUBSCRIPT

5.1 Training Performance

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (5)

We compare our auxiliary goal formulation against one that uses an angular rotation objective (w/o auxiliary goal), a common formulation of object rotation as shown in prior works[5, 6, 31]. We also compare learning without adaptive curriculum (w/o curriculum). More details of the baselines can be found in the Appendices B.2 and B.3. The learning curves for average rotation and successive goals reached are shown in Figure6. While the agent can learn to rotate objects in the single-axis setting using an angular rotation objective, it resulted in much lower accuracy, obtaining near-zero successive goals reached, despite having a rotation axis penalty. In the multi-axis setting, the training was unsuccessful and the agent was unable to maintain stable rotation. The agent also failed to learn without the adaptive reward curriculum. The resulting policy shows the learning getting stuck where the agent keeps the object stable without rotating it. This presents a difficult exploration problem and without the guidance of an adaptive curriculum, the agent cannot escape this local optima.

5.2 Simulation Results

The results for multi-axis rotation in arbitrary hand orientations are shown in Table5. First we observe that a policy trained in a fixed hand orientation performed poorly in arbitrary hand orientations, suggesting gravity invariance adds considerable complexity to the task despite performing precision grasp manipulation. Table1 compares our dense touch policy (contact pose and contact force) with policies trained with proprioception, binary touch, and discrete touch (a discretized representation introduced in [6]).

Contrary to the findings in [6], we find binary touch to be beneficial over proprioception alone. In our case, we attribute this to including binary contact information during teacher training, which provides a better base policy. Overall, we found that performance improved with more detailed tactile sensing. The dense touch policy, trained with information regarding contact pose and force, outperformed policies that used simpler, less detailed touch. Moreover, discretizing the contact location led to a drop in performance compared with contact pose, suggesting that this type of representation is not as well suited to the morphology of our front-facing sensor. The ablation studies showed that contact force can provide useful information regarding the interaction physics, which improves the performance when handling objects with different mass properties; in addition, excluding either feature of dense touch resulted in suboptimal performance.

5.3 Real-world Results

ObservationPalm UpPalm DownBase UpBase DownThumb UpThumb DownRotTTT(s)RotTTT(s)RotTTT(s)RotTTT(s)RotTTT(s)RotTTT(s)Proprio1.47±0.69subscript1.47plus-or-minus0.691.47_{\pm 0.69}1.47 start_POSTSUBSCRIPT ± 0.69 end_POSTSUBSCRIPT27.6±3.8subscript27.6plus-or-minus3.827.6_{\pm 3.8}27.6 start_POSTSUBSCRIPT ± 3.8 end_POSTSUBSCRIPT1.05±0.37subscript1.05plus-or-minus0.371.05_{\pm 0.37}1.05 start_POSTSUBSCRIPT ± 0.37 end_POSTSUBSCRIPT25.3±4.0subscript25.3plus-or-minus4.025.3_{\pm 4.0}25.3 start_POSTSUBSCRIPT ± 4.0 end_POSTSUBSCRIPT0.84±0.30subscript0.84plus-or-minus0.300.84_{\pm 0.30}0.84 start_POSTSUBSCRIPT ± 0.30 end_POSTSUBSCRIPT26.8±3.6subscript26.8plus-or-minus3.626.8_{\pm 3.6}26.8 start_POSTSUBSCRIPT ± 3.6 end_POSTSUBSCRIPT0.87±0.46subscript0.87plus-or-minus0.460.87_{\pm 0.46}0.87 start_POSTSUBSCRIPT ± 0.46 end_POSTSUBSCRIPT22.8±9.6subscript22.8plus-or-minus9.622.8_{\pm 9.6}22.8 start_POSTSUBSCRIPT ± 9.6 end_POSTSUBSCRIPT0.78±0.53subscript0.78plus-or-minus0.530.78_{\pm 0.53}0.78 start_POSTSUBSCRIPT ± 0.53 end_POSTSUBSCRIPT20.3±9.9subscript20.3plus-or-minus9.920.3_{\pm 9.9}20.3 start_POSTSUBSCRIPT ± 9.9 end_POSTSUBSCRIPT0.51±0.65subscript0.51plus-or-minus0.650.51_{\pm 0.65}0.51 start_POSTSUBSCRIPT ± 0.65 end_POSTSUBSCRIPT9.50±8.9subscript9.50plus-or-minus8.99.50_{\pm 8.9}9.50 start_POSTSUBSCRIPT ± 8.9 end_POSTSUBSCRIPTBinary Touch1.32±0.52subscript1.32plus-or-minus0.521.32_{\pm 0.52}1.32 start_POSTSUBSCRIPT ± 0.52 end_POSTSUBSCRIPT25.5±6.5subscript25.5plus-or-minus6.525.5_{\pm 6.5}25.5 start_POSTSUBSCRIPT ± 6.5 end_POSTSUBSCRIPT0.89±0.28subscript0.89plus-or-minus0.280.89_{\pm 0.28}0.89 start_POSTSUBSCRIPT ± 0.28 end_POSTSUBSCRIPT23.8±4.6subscript23.8plus-or-minus4.623.8_{\pm 4.6}23.8 start_POSTSUBSCRIPT ± 4.6 end_POSTSUBSCRIPT0.86±0.32subscript0.86plus-or-minus0.320.86_{\pm 0.32}0.86 start_POSTSUBSCRIPT ± 0.32 end_POSTSUBSCRIPT25.3±6.2subscript25.3plus-or-minus6.225.3_{\pm 6.2}25.3 start_POSTSUBSCRIPT ± 6.2 end_POSTSUBSCRIPT0.77±0.28subscript0.77plus-or-minus0.280.77_{\pm 0.28}0.77 start_POSTSUBSCRIPT ± 0.28 end_POSTSUBSCRIPT23.0±4.7subscript23.0plus-or-minus4.723.0_{\pm 4.7}23.0 start_POSTSUBSCRIPT ± 4.7 end_POSTSUBSCRIPT0.83±0.49subscript0.83plus-or-minus0.490.83_{\pm 0.49}0.83 start_POSTSUBSCRIPT ± 0.49 end_POSTSUBSCRIPT22.6±9.0subscript22.6plus-or-minus9.022.6_{\pm 9.0}22.6 start_POSTSUBSCRIPT ± 9.0 end_POSTSUBSCRIPT0.47±0.32subscript0.47plus-or-minus0.320.47_{\pm 0.32}0.47 start_POSTSUBSCRIPT ± 0.32 end_POSTSUBSCRIPT13.2±5.7subscript13.2plus-or-minus5.713.2_{\pm 5.7}13.2 start_POSTSUBSCRIPT ± 5.7 end_POSTSUBSCRIPTDense Touch1.57±0.57subscript1.57plus-or-minus0.57\boldsymbol{1.57_{\pm 0.57}}bold_1.57 start_POSTSUBSCRIPT bold_± bold_0.57 end_POSTSUBSCRIPT30.0±0.0subscript30.0plus-or-minus0.0\boldsymbol{30.0_{\pm 0.0}}bold_30.0 start_POSTSUBSCRIPT bold_± bold_0.0 end_POSTSUBSCRIPT1.33±0.44subscript1.33plus-or-minus0.44\boldsymbol{1.33_{\pm 0.44}}bold_1.33 start_POSTSUBSCRIPT bold_± bold_0.44 end_POSTSUBSCRIPT28.2±3.1subscript28.2plus-or-minus3.1\boldsymbol{28.2_{\pm 3.1}}bold_28.2 start_POSTSUBSCRIPT bold_± bold_3.1 end_POSTSUBSCRIPT1.32±0.32subscript1.32plus-or-minus0.32\boldsymbol{1.32_{\pm 0.32}}bold_1.32 start_POSTSUBSCRIPT bold_± bold_0.32 end_POSTSUBSCRIPT29.8±0.6subscript29.8plus-or-minus0.6\boldsymbol{29.8_{\pm 0.6}}bold_29.8 start_POSTSUBSCRIPT bold_± bold_0.6 end_POSTSUBSCRIPT1.17±0.38subscript1.17plus-or-minus0.38\boldsymbol{1.17_{\pm 0.38}}bold_1.17 start_POSTSUBSCRIPT bold_± bold_0.38 end_POSTSUBSCRIPT29.4±1.8subscript29.4plus-or-minus1.8\boldsymbol{29.4_{\pm 1.8}}bold_29.4 start_POSTSUBSCRIPT bold_± bold_1.8 end_POSTSUBSCRIPT1.08±0.47subscript1.08plus-or-minus0.47\boldsymbol{1.08_{\pm 0.47}}bold_1.08 start_POSTSUBSCRIPT bold_± bold_0.47 end_POSTSUBSCRIPT27.9±3.1subscript27.9plus-or-minus3.1\boldsymbol{27.9_{\pm 3.1}}bold_27.9 start_POSTSUBSCRIPT bold_± bold_3.1 end_POSTSUBSCRIPT0.91±0.33subscript0.91plus-or-minus0.33\boldsymbol{0.91_{\pm 0.33}}bold_0.91 start_POSTSUBSCRIPT bold_± bold_0.33 end_POSTSUBSCRIPT29.2±2.0subscript29.2plus-or-minus2.0\boldsymbol{29.2_{\pm 2.0}}bold_29.2 start_POSTSUBSCRIPT bold_± bold_2.0 end_POSTSUBSCRIPT

Object rotation performance for various hand orientations and rotation axes are given in Tables 1 and 7. In both cases, the dense touch policy performed the best, demonstrating a successful transfer of the dense tactile observations. The proprioception and binary touch policies were less effective at maintaining stable rotation, often resulting in loss of contact or getting stuck.

Hand orientations. The performance dropped as the hand directions changed from palm up and palm down, followed by base up and base down, to the thumb up and thumb down directions. We attribute this to the larger sim-to-real gap when fingers are positioned horizontally during manipulation. In the latter cases, the gravity loading of the fingers acts against actuation, which weakens the hand in those orientations. However, despite the noisy system, a policy provided with rich tactile information consistently demonstrated stable performance. Examples are shown in Figure 9.

Observationx-axisy-axisz-axisRotTTT(s)RotTTT(s)RotTTT(s)Proprio0.35±0.33subscript0.35plus-or-minus0.330.35_{\pm 0.33}0.35 start_POSTSUBSCRIPT ± 0.33 end_POSTSUBSCRIPT16.6±12.6subscript16.6plus-or-minus12.616.6_{\pm 12.6}16.6 start_POSTSUBSCRIPT ± 12.6 end_POSTSUBSCRIPT0.17±0.19subscript0.17plus-or-minus0.190.17_{\pm 0.19}0.17 start_POSTSUBSCRIPT ± 0.19 end_POSTSUBSCRIPT8.33±8.5subscript8.33plus-or-minus8.58.33_{\pm{8.5}}8.33 start_POSTSUBSCRIPT ± 8.5 end_POSTSUBSCRIPT1.05±0.37subscript1.05plus-or-minus0.371.05_{\pm 0.37}1.05 start_POSTSUBSCRIPT ± 0.37 end_POSTSUBSCRIPT25.3±4.0subscript25.3plus-or-minus4.025.3_{\pm 4.0}25.3 start_POSTSUBSCRIPT ± 4.0 end_POSTSUBSCRIPTBinary Touch0.87±0.43subscript0.87plus-or-minus0.430.87_{\pm 0.43}0.87 start_POSTSUBSCRIPT ± 0.43 end_POSTSUBSCRIPT26.5±5.4subscript26.5plus-or-minus5.426.5_{\pm 5.4}26.5 start_POSTSUBSCRIPT ± 5.4 end_POSTSUBSCRIPT0.25±0.18subscript0.25plus-or-minus0.180.25_{\pm 0.18}0.25 start_POSTSUBSCRIPT ± 0.18 end_POSTSUBSCRIPT15.9±10.5subscript15.9plus-or-minus10.515.9_{\pm 10.5}15.9 start_POSTSUBSCRIPT ± 10.5 end_POSTSUBSCRIPT0.89±0.28subscript0.89plus-or-minus0.280.89_{\pm 0.28}0.89 start_POSTSUBSCRIPT ± 0.28 end_POSTSUBSCRIPT23.8±4.6subscript23.8plus-or-minus4.623.8_{\pm 4.6}23.8 start_POSTSUBSCRIPT ± 4.6 end_POSTSUBSCRIPTDense Touch1.33±0.50subscript1.33plus-or-minus0.50\boldsymbol{1.33_{\pm 0.50}}bold_1.33 start_POSTSUBSCRIPT bold_± bold_0.50 end_POSTSUBSCRIPT28.6±2.8subscript28.6plus-or-minus2.8\boldsymbol{28.6_{\pm 2.8}}bold_28.6 start_POSTSUBSCRIPT bold_± bold_2.8 end_POSTSUBSCRIPT0.79±0.37subscript0.79plus-or-minus0.37\boldsymbol{0.79_{\pm 0.37}}bold_0.79 start_POSTSUBSCRIPT bold_± bold_0.37 end_POSTSUBSCRIPT27.84.8subscript27.84.8\boldsymbol{27.8_{4.8}}bold_27.8 start_POSTSUBSCRIPT bold_4.8 end_POSTSUBSCRIPT1.33±0.44subscript1.33plus-or-minus0.44\boldsymbol{1.33_{\pm 0.44}}bold_1.33 start_POSTSUBSCRIPT bold_± bold_0.44 end_POSTSUBSCRIPT28.2±3.1subscript28.2plus-or-minus3.1\boldsymbol{28.2_{\pm 3.1}}bold_28.2 start_POSTSUBSCRIPT bold_± bold_3.1 end_POSTSUBSCRIPT

Rotation Axis. Rotation about z𝑧zitalic_z-axis was the easiest to achieve, followed by the x𝑥xitalic_x- and y𝑦yitalic_y-axes. We noticed that binary touch had similar results to proprioception when rotating about the z𝑧zitalic_z-axis, but performed better for x𝑥xitalic_x- and y𝑦yitalic_y-rotation axes. The latter axes require two fingers to hold the object steady (middle/thumb or index/pinky) while the remaining two fingers provide a stable rotating motion. This requires more sophisticated finger-gaiting, and the policy struggled to perform well with proprioception alone.

Tactile sensor Response. An analysis of the processed tactile sensor outputs during a rollout is shown Figure9. The two key motions for stable object rotation can be seen in the output pose and force. Given rich tactile sensing on a multi-fingered hand, the policy can detect when the object is slipping out of stable grasp to provide reactive finger-gaiting motions that prevent the object from slipping further. This emergent behavior was not seen when using proprioception or binary touch.

Gravity Invariance. We also demonstrate that the trained policy can adapt effectively to a rotating hand, where the gravity vector is continuously changing in the hand’s frame of reference. Examples of three hand trajectories are provided in the AppendixK.2 and the accompanying Supplementary Video. This capability to manipulate objects during angular movements of the hand enables 6D reorientation of the object while simultaneously repositioning the grasp location. This gives a new level of dexterity for robot hands that could be beneficial in many tasks, e.g. general pick-and-place.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (6)
AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (7)

6 Conclusion and Limitations

In this paper, we demonstrated the capability of a general policy leveraging rich tactile sensing to perform in-hand object rotation about any rotation axis in any hand direction. This marks a significant step toward more general tactile dexterity with fully-actuated multi-fingered robot hands.

While dense touch generally gave the best performance, it still had difficulties with objects that were box-shaped or of larger aspect ratios. We attribute this to the problem of some grasping points producing similar tactile information for different states of the system. Using richer tactile representations, such as tactile images or contact force fields, or integrating vision information, could help infer additional properties to enhance robustness. Also, the actuation of the Allegro Hand was significantly weakened under certain hand orientations. Therefore, designing low-cost and more capable hardware is crucial for advancing dexterous manipulation with multi-fingered robotic hands.

The goal to manipulate objects effortlessly in free space using a sense of touch mirrors a key aspect of human dexterity and stands as a significant goal in robot manipulation. We hope that our research underscores the importance of tactile sensing and spurs continued efforts towards this goal.

Acknowledgments

We thank Andrew Stinchcombe for helping with the 3D-printing of thestimuli and tactile sensors. We thank Haozhi Qi for the valuable discussions. This work was supported by the EPSRC Doctoral Training Partnership (DTP) scholarship.

References

  • Okamura etal. [2000]A.M. Okamura, N.Smaby, and M.R. Cutkosky.An overview of dexterous manipulation.In Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), volume1, pages 255–262. IEEE, 2000.
  • Akkaya etal. [2019]I.Akkaya, M.Andrychowicz, M.Chociej, M.Litwin, B.McGrew, A.Petron, A.Paino, M.Plappert, G.Powell, R.Ribas, etal.Solving rubik’s cube with a robot hand.arXiv preprint arXiv:1910.07113, 2019.
  • Andrychowicz etal. [2020]O.M. Andrychowicz, B.Baker, M.Chociej, R.Jozefowicz, B.McGrew, J.Pachocki, A.Petron, M.Plappert, G.Powell, A.Ray, etal.Learning dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3–20, 2020.
  • Khandate etal. [2022]G.Khandate, M.Haas-Heger, and M.Ciocarlie.On the feasibility of learning finger-gaiting in-hand manipulation with intrinsic sensing.In 2022 International Conference on Robotics and Automation (ICRA), pages 2752–2758. IEEE, 2022.
  • Qi etal. [2023a]H.Qi, A.Kumar, R.Calandra, Y.Ma, and J.Malik.In-hand object rotation via rapid motor adaptation.In Conference on Robot Learning, pages 1722–1732. PMLR, 2023a.
  • Qi etal. [2023b]H.Qi, B.Yi, S.Suresh, M.Lambeta, Y.Ma, R.Calandra, and J.Malik.General in-hand object rotation with vision and touch.In Conference on Robot Learning, pages 2549–2564. PMLR, 2023b.
  • Khandate etal. [2023]G.Khandate, S.Shang, E.T. Chang, T.L. Saidi, J.Adams, and M.Ciocarlie.Sampling-based exploration for reinforcement learning of dexterous manipulation.arXiv preprint arXiv:2303.03486, 2023.
  • Han and Trinkle [1998]L.Han and J.C. Trinkle.Dextrous manipulation by rolling and finger gaiting.In Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), volume1, pages 730–735. IEEE, 1998.
  • Han etal. [1997]L.Han, Y.-S. Guan, Z.Li, Q.Shi, and J.C. Trinkle.Dextrous manipulation with rolling contacts.In Proceedings of International Conference on Robotics and Automation, volume2, pages 992–997. IEEE, 1997.
  • Bicchi and Sorrentino [1995]A.Bicchi and R.Sorrentino.Dexterous manipulation through rolling.In Proceedings of 1995 IEEE International Conference on Robotics and Automation, volume1, pages 452–457. IEEE, 1995.
  • Rus [1999]D.Rus.In-hand dexterous manipulation of piecewise-smooth 3-d objects.The International Journal of Robotics Research, 18(4):355–381, 1999.
  • Fearing [1986]R.Fearing.Implementing a force strategy for object re-orientation.In Proceedings. 1986 IEEE International Conference on Robotics and Automation, volume3, pages 96–102. IEEE, 1986.
  • Leveroni and Salisbury [1996]S.Leveroni and K.Salisbury.Reorienting objects with a robot hand using grasp gaits.In Robotics Research: The Seventh International Symposium, pages 39–51. Springer, 1996.
  • Platt etal. [2004]R.Platt, A.H. fa*gg, and R.A. Grupen.Manipulation gaits: Sequences of grasp control tasks.In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume1, pages 801–806. IEEE, 2004.
  • Saut etal. [2007]J.-P. Saut, A.Sahbani, S.El-Khoury, and V.Perdereau.Dexterous manipulation planning using probabilistic roadmaps in continuous grasp subspaces.In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2907–2912. IEEE, 2007.
  • Bai and Liu [2014]Y.Bai and C.K. Liu.Dexterous manipulation using both palm and fingers.In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1560–1565. IEEE, 2014.
  • Shi etal. [2017]J.Shi, J.Z. Woodruff, P.B. Umbanhowar, and K.M. Lynch.Dynamic in-hand sliding manipulation.IEEE Transactions on Robotics, 33(4):778–795, 2017.
  • Teeple etal. [2022]C.B. Teeple, B.Aktaş, M.C. Yuen, G.R. Kim, R.D. Howe, and R.J. Wood.Controlling palm-object interactions via friction for enhanced in-hand manipulation.IEEE Robotics and Automation Letters, 7(2):2258–2265, 2022.
  • Fan etal. [2017]Y.Fan, W.Gao, W.Chen, and M.Tomizuka.Real-time finger gaits planning for dexterous manipulation.IFAC-PapersOnLine, 50(1):12765–12772, 2017.
  • Sundaralingam and Hermans [2018]B.Sundaralingam and T.Hermans.Geometric in-hand regrasp planning: Alternating optimization of finger gaits and in-grasp manipulation.In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 231–238. IEEE, 2018.
  • Morgan etal. [2022]A.S. Morgan, K.Hang, B.Wen, K.Bekris, and A.M. Dollar.Complex in-hand manipulation via compliance-enabled finger gaiting and multi-modal planning.IEEE Robotics and Automation Letters, 7(2):4821–4828, 2022.
  • Khadivar and Billard [2023]F.Khadivar and A.Billard.Adaptive fingers coordination for robust grasp and in-hand manipulation under disturbances and unknown dynamics.IEEE Transactions on Robotics, 2023.
  • Gao etal. [2023]X.Gao, K.Yao, F.Khadivar, and A.Billard.Real-time motion planning for in-hand manipulation with a multi-fingered hand.arXiv preprint arXiv:2309.06955, 2023.
  • Pang etal. [2023]T.Pang, H.T. Suh, L.Yang, and R.Tedrake.Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models.IEEE Transactions on Robotics, 2023.
  • Nagabandi etal. [2020]A.Nagabandi, K.Konolige, S.Levine, and V.Kumar.Deep dynamics models for learning dexterous manipulation.In Conference on Robot Learning, pages 1101–1112. PMLR, 2020.
  • Huang etal. [2023]B.Huang, Y.Chen, T.Wang, Y.Qin, Y.Yang, N.Atanasov, and X.Wang.Dynamic handover: Throw and catch with bimanual hands.arXiv preprint arXiv:2309.05655, 2023.
  • Chen etal. [2022a]T.Chen, J.Xu, and P.Agrawal.A system for general in-hand object re-orientation.In Conference on Robot Learning, pages 297–307. PMLR, 2022a.
  • Chen etal. [2022b]T.Chen, M.Tippur, S.Wu, V.Kumar, E.Adelson, and P.Agrawal.Visual dexterity: In-hand dexterous manipulation from depth.arXiv preprint arXiv:2211.11744, 2022b.
  • Allshire etal. [2022]A.Allshire, M.MittaI, V.Lodaya, V.Makoviychuk, D.Makoviichuk, F.Widmaier, M.Wüthrich, S.Bauer, A.Handa, and A.Garg.Transferring dexterous manipulation from gpu simulation to a remote real-world trifinger.In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11802–11809. IEEE, 2022.
  • Handa etal. [2023]A.Handa, A.Allshire, V.Makoviychuk, A.Petrenko, R.Singh, J.Liu, D.Makoviichuk, K.VanWyk, A.Zhurkevich, B.Sundaralingam, etal.Dextreme: Transfer of agile in-hand manipulation from simulation to reality.In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5977–5984. IEEE, 2023.
  • Yin etal. [2023]Z.-H. Yin, B.Huang, Y.Qin, Q.Chen, and X.Wang.Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023.
  • Chen etal. [2023]T.Chen, M.Tippur, S.Wu, V.Kumar, E.Adelson, and P.Agrawal.Visual dexterity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023.
  • Sievers etal. [2022]L.Sievers, J.Pitz, and B.Bäuml.Learning purely tactile in-hand manipulation with a torque-controlled hand.In 2022 International Conference on Robotics and Automation (ICRA), pages 2745–2751. IEEE, 2022.
  • Röstel etal. [2023]L.Röstel, J.Pitz, L.Sievers, and B.Bäuml.Estimator-coupled reinforcement learning for robust purely tactile in-hand manipulation.In 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2023.
  • Pitz etal. [2023]J.Pitz, L.Röstel, L.Sievers, and B.Bäuml.Dextrous tactile in-hand manipulation using a modular reinforcement learning architecture.arXiv preprint arXiv:2303.04705, 2023.
  • Yuan etal. [2017]W.Yuan, S.Dong, and E.H. Adelson.GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force.Sensors, 17(12):2762, Dec. 2017.doi:10.3390/s17122762.
  • Ward-Cherrier etal. [2018]B.Ward-Cherrier, N.Pestell, L.Cramphorn, B.Winstone, M.E. Giannaccini, J.Rossiter, and N.F. Lepora.The tactip family: Soft optical tactile sensors with 3d-printed biomimetic morphologies.Soft robotics, 5(2):216–227, 2018.
  • Lambeta etal. [2020]M.Lambeta, P.-W. Chou, S.Tian, B.Yang, B.Maloon, V.R. Most, D.Stroud, R.Santos, A.Byagowi, G.Kammerer, etal.Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5(3):3838–3845, 2020.
  • Lepora etal. [2022]N.F. Lepora, Y.Lin, B.Money-Coomes, and J.Lloyd.Digitac: A digit-tactip hybrid tactile sensor for comparing low-cost high-resolution robot touch.IEEE Robotics and Automation Letters, 7(4):9382–9388, 2022.
  • Daniel F.Gomes and Luo [2021]P.P. Daniel F.Gomes and S.Luo.Generation of gelsight tactile images for sim2real learning.IEEE Robotics and Automation Letters, 6(2):4177–4184, Apr. 2021.
  • Wang etal. [2022]S.Wang, M.Lambeta, P.-W. Chou, and R.Calandra.Tacto: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors.IEEE Robotics and Automation Letters, 7(2):3930–3937, 2022.
  • Si and Yuan [2022]Z.Si and W.Yuan.Taxim: An example-based simulation model for gelsight tactile sensors.IEEE Robotics and Automation Letters, pages 2361–2368, 2022.
  • Chen etal. [2023]Z.Chen, S.Zhang, S.Luo, F.Sun, and B.Fang.Tacchi: A pluggable and low computational cost elastomer deformation simulator for optical tactile sensors.IEEE Robotics and Automation Letters, 8(3):1239–1246, 2023.doi:10.1109/LRA.2023.3237042.
  • Church etal. [2021]A.Church, J.Lloyd, R.Hadsell, and N.Lepora.Tactile Sim-to-Real Policy Transfer via Real-to-Sim Image Translation.In Proceedings of the 5th Conference on Robot Learning, pages 1–9. PMLR, Oct. 2021.
  • Jianu etal. [2021]T.Jianu, D.F. Gomes, and S.Luo.Reducing tactile sim2real domain gaps via deep texture generation networks.arXiv preprint arXiv:2112.01807, 2021.
  • Chen etal. [2022]W.Chen, Y.Xu, Z.Chen, P.Zeng, R.Dang, R.Chen, and J.Xu.Bidirectional sim-to-real transfer for gelsight tactile sensors with cyclegan.IEEE Robotics and Automation Letters, 7(3):6187–6194, 2022.
  • Xu etal. [2023]J.Xu, S.Kim, T.Chen, A.R. Garcia, P.Agrawal, W.Matusik, and S.Sueda.Efficient tactile simulation with differentiability for robotic manipulation.In Proceedings of The 6th Conference on Robot Learning, volume 205 of Proceedings of Machine Learning Research, pages 1488–1498. PMLR, 14–18 Dec 2023.URL https://proceedings.mlr.press/v205/xu23b.html.
  • Luu etal. [2023]Q.K. Luu, N.H. Nguyen, etal.Simulation, learning, and application of vision-based tactile sensing at large scale.IEEE Transactions on Robotics, 2023.
  • Lin etal. [2022]Y.Lin, J.Lloyd, A.Church, and N.Lepora.Tactile gym 2.0: Sim-to-real deep reinforcement learning for comparing low-cost high-resolution robot touch.volume7 of Proceedings of Machine Learning Research, pages 10754–10761. IEEE, August 2022.doi:10.1109/LRA.2022.3195195.URL https://ieeexplore.ieee.org/abstract/document/9847020.
  • Lin etal. [2023]Y.Lin, A.Church, M.Yang, H.Li, J.Lloyd, D.Zhang, and N.F. Lepora.Bi-touch: Bimanual tactile manipulation with sim-to-real deep reinforcement learning.IEEE Robotics and Automation Letters, 2023.
  • Yang etal. [2023]M.Yang, Y.Lin, A.Church, J.Lloyd, D.Zhang, D.A. Barton, and N.F. Lepora.Sim-to-real model-based and model-free deep reinforcement learning for tactile pushing.IEEE Robotics and Automation Letters, 2023.
  • Schulman etal. [2017]J.Schulman, F.Wolski, P.Dhariwal, A.Radford, and O.Klimov.Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017.
  • Makoviychuk etal. [2021]V.Makoviychuk, L.Wawrzyniak, Y.Guo, M.Lu, K.Storey, M.Macklin, D.Hoeller, N.Rudin, A.Allshire, A.Handa, etal.Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021.
  • Brahmbhatt etal. [2019]S.Brahmbhatt, C.Ham, C.C. Kemp, and J.Hays.Contactdb: Analyzing and predicting grasp contact via thermal imaging.In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8709–8719, 2019.
  • Hansen etal. [2019]N.Hansen, Y.Akimoto, and P.Baudis.CMA-ES/pycma on Github.Zenodo, DOI:10.5281/zenodo.2559634, Feb. 2019.URL https://doi.org/10.5281/zenodo.2559634.
  • Makoviichuk and Makoviychuk [2021]D.Makoviichuk and V.Makoviychuk.rl-games: A high-performance framework for reinforcement learning.https://github.com/Denys88/rl_games, May 2021.
  • Kumar etal. [2021]A.Kumar, Z.Fu, D.Pathak, and J.Malik.Rma: Rapid motor adaptation for legged robots.arXiv preprint arXiv:2107.04034, 2021.
  • Lepora [2021]N.F. Lepora.Soft Biomimetic Optical Tactile Sensing With the TacTip: A Review.IEEE Sensors Journal, 21(19):21131–21143, Oct. 2021.ISSN 1530-437X, 1558-1748, 2379-9153.doi:10.1109/JSEN.2021.3100645.
  • Bradski [2000]G.Bradski.The OpenCV Library.Dr. Dobb’s Journal of Software Tools, 2000.

Appendix A Observations and Privileged Information

The full list of real-world observations otsubscript𝑜𝑡o_{t}italic_o start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT and privileged information xtsubscript𝑥𝑡x_{t}italic_x start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT used for the agent is presented in Tables2 and 3 respectively. This privileged information is used to train the teacher with RL and for obtaining the target latent vector z¯¯𝑧\bar{z}over¯ start_ARG italic_z end_ARG during student training. It is not used during deployment. The proprioception and tactile dimensions are in multiples of four, representing four fingers.

NameSymbolDimensions
Proprioception
Joint Positionq𝑞qitalic_q16
Fingertip Positionfpsuperscript𝑓𝑝f^{p}italic_f start_POSTSUPERSCRIPT italic_p end_POSTSUPERSCRIPT12
Fingertip Orientationfosuperscript𝑓𝑜f^{o}italic_f start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT16
Previous Actionat1subscript𝑎𝑡1a_{t-1}italic_a start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT16
Target Joint Positionsq¯¯𝑞\bar{q}over¯ start_ARG italic_q end_ARG16
Tactile
Binary Contactc𝑐citalic_c4
Contact PoseP𝑃Pitalic_P8
Contact Force MagnitudeF𝐹Fitalic_F4
Task
Target Rotation Axisk^^𝑘\hat{k}over^ start_ARG italic_k end_ARG3
NameSymbolDimensions
Object Information
Positionposubscript𝑝𝑜p_{o}italic_p start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT3
Orientationrosubscript𝑟𝑜r_{o}italic_r start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT4
Angular Velocityωrsubscript𝜔𝑟{\omega_{r}}italic_ω start_POSTSUBSCRIPT italic_r end_POSTSUBSCRIPT3
Dimensionsdimosubscriptdimo\rm dim_{o}roman_dim start_POSTSUBSCRIPT roman_o end_POSTSUBSCRIPT2
Center of MassCOMosubscriptCOMo\rm COM_{o}roman_COM start_POSTSUBSCRIPT roman_o end_POSTSUBSCRIPT3
Massmosubscript𝑚𝑜m_{o}italic_m start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT1
Gravity Vectorg^^𝑔\hat{g}over^ start_ARG italic_g end_ARG3
Auxiliary Goal Information
Positionpgsubscript𝑝𝑔p_{g}italic_p start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT3
Orientationrgsubscript𝑟𝑔r_{g}italic_r start_POSTSUBSCRIPT italic_g end_POSTSUBSCRIPT4

Appendix B Reward Function

B.1 Base Reward

In the following, we explicitly define each term of the reward function used for learning multi-axis object rotation. The full reward function is:

r=rrotation+rcontact+rstable+rterminate,𝑟subscript𝑟rotationsubscript𝑟contactsubscript𝑟stablesubscript𝑟terminater=r_{\rm rotation}+r_{\rm contact}+r_{\rm stable}+r_{\rm terminate},\\italic_r = italic_r start_POSTSUBSCRIPT roman_rotation end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_contact end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_terminate end_POSTSUBSCRIPT ,(2)

where,

rrotation=λkprkp+λrotrrot+λgoalrgoal,rcontact=λrew(λgcrgc+λbcrbc),rstable=λrew(λωrω+λposerpose+λworkrwork+λtorquertorque),rterminate=λpenaltyrpenaltyformulae-sequencesubscript𝑟rotationsubscript𝜆kpsubscript𝑟kpsubscript𝜆rotsubscript𝑟rotsubscript𝜆goalsubscript𝑟goalformulae-sequencesubscript𝑟contactsubscript𝜆rewsubscript𝜆gcsubscript𝑟gcsubscript𝜆bcsubscript𝑟bcformulae-sequencesubscript𝑟stablesubscript𝜆rewsubscript𝜆𝜔subscript𝑟𝜔subscript𝜆posesubscript𝑟posesubscript𝜆worksubscript𝑟worksubscript𝜆torquesubscript𝑟torquesubscript𝑟terminatesubscript𝜆penaltysubscript𝑟penalty\displaystyle\begin{split}&r_{\rm rotation}=\lambda_{\rm kp}r_{\rm kp}+\lambda%_{\rm rot}r_{\rm rot}+\lambda_{\rm goal}r_{\rm goal},\\&r_{\rm contact}=\lambda_{\text{rew}}(\lambda_{\rm gc}r_{\rm gc}+\lambda_{\rmbc%}r_{\rm bc}),\\&r_{\rm stable}=\lambda_{\text{rew}}(\lambda_{\omega}r_{\omega}+\lambda_{\rmpose%}r_{\rm pose}+\lambda_{\rm work}r_{\rm work}+\lambda_{\rm torque}r_{\rm torque%}),\\&r_{\rm terminate}=\lambda_{\rm penalty}r_{\rm penalty}\ \end{split}start_ROW start_CELL end_CELL start_CELL italic_r start_POSTSUBSCRIPT roman_rotation end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT roman_kp end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_kp end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT roman_rot end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_rot end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT roman_goal end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_goal end_POSTSUBSCRIPT , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_r start_POSTSUBSCRIPT roman_contact end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT rew end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT roman_gc end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_gc end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT roman_bc end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_bc end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT rew end_POSTSUBSCRIPT ( italic_λ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT roman_pose end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_pose end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT roman_work end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_work end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT roman_torque end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_torque end_POSTSUBSCRIPT ) , end_CELL end_ROW start_ROW start_CELL end_CELL start_CELL italic_r start_POSTSUBSCRIPT roman_terminate end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT roman_penalty end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_penalty end_POSTSUBSCRIPT end_CELL end_ROW

Keypoint Distance Reward:

rkp=dkp(eax+b+eax)subscript𝑟kpsubscript𝑑kpsuperscript𝑒𝑎𝑥𝑏superscript𝑒𝑎𝑥r_{\rm kp}=\frac{d_{\rm kp}}{(e^{ax}+b+e^{-ax})}italic_r start_POSTSUBSCRIPT roman_kp end_POSTSUBSCRIPT = divide start_ARG italic_d start_POSTSUBSCRIPT roman_kp end_POSTSUBSCRIPT end_ARG start_ARG ( italic_e start_POSTSUPERSCRIPT italic_a italic_x end_POSTSUPERSCRIPT + italic_b + italic_e start_POSTSUPERSCRIPT - italic_a italic_x end_POSTSUPERSCRIPT ) end_ARG(3)

where the keypoint distance kpdist=1Ni=1Nkiokig𝑘subscript𝑝dist1𝑁subscriptsuperscript𝑁𝑖1normsubscriptsuperscript𝑘o𝑖subscriptsuperscript𝑘g𝑖kp_{\rm dist}=\frac{1}{N}\sum^{N}_{i=1}||k^{\rm o}_{i}-k^{\rm g}_{i}||italic_k italic_p start_POSTSUBSCRIPT roman_dist end_POSTSUBSCRIPT = divide start_ARG 1 end_ARG start_ARG italic_N end_ARG ∑ start_POSTSUPERSCRIPT italic_N end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT | | italic_k start_POSTSUPERSCRIPT roman_o end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT - italic_k start_POSTSUPERSCRIPT roman_g end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT | |, kosuperscript𝑘𝑜k^{o}italic_k start_POSTSUPERSCRIPT italic_o end_POSTSUPERSCRIPT and kgsuperscript𝑘𝑔k^{g}italic_k start_POSTSUPERSCRIPT italic_g end_POSTSUPERSCRIPT are keypoint positions of the object and goal respectively. We use N=6𝑁6N=6italic_N = 6 keypoints placed 5 cm from the object origin in each of its principle axes, and the parameters a=50𝑎50a=50italic_a = 50, b=2.0𝑏2.0b=2.0italic_b = 2.0.

Rotation Reward:

rrot=clip(ΔΘk^;c1,c1)subscript𝑟rotclipΔΘ^𝑘subscript𝑐1subscript𝑐1r_{\rm rot}=\text{clip}(\Delta\Theta\cdot\hat{k};-c_{1},c_{1})italic_r start_POSTSUBSCRIPT roman_rot end_POSTSUBSCRIPT = clip ( roman_Δ roman_Θ ⋅ over^ start_ARG italic_k end_ARG ; - italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT )(4)

The rotation reward represents the change in object rotation about the target rotation axis. We clip this reward in the limit c1=0.025subscript𝑐10.025c_{1}=0.025italic_c start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT = 0.025rad.

Goal Bonus Reward:

rgoal={1ifkpdist<dtol0otherwisesubscript𝑟goalcases1if𝑘subscript𝑝distsubscript𝑑tolotherwise0otherwiseotherwiser_{\rm goal}=\begin{cases}1\quad{\rm if}\ kp_{\rm dist}<d_{\rm tol}\\0\quad\text{otherwise}\\\end{cases}italic_r start_POSTSUBSCRIPT roman_goal end_POSTSUBSCRIPT = { start_ROW start_CELL 1 roman_if italic_k italic_p start_POSTSUBSCRIPT roman_dist end_POSTSUBSCRIPT < italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 otherwise end_CELL start_CELL end_CELL end_ROW(5)

where we use a keypoint distance tolerance dtolsubscript𝑑told_{\rm tol}italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT to determine when a goal has been reached.

Good Contact Reward:

rgc={1ifntip_contact20otherwisesubscript𝑟gccases1ifntip_contact2otherwise0otherwiseotherwiser_{\rm gc}=\begin{cases}1\quad\text{if $n_{\text{tip\_contact}}\geq 2$}\\0\quad\text{otherwise}\\\end{cases}italic_r start_POSTSUBSCRIPT roman_gc end_POSTSUBSCRIPT = { start_ROW start_CELL 1 if italic_n start_POSTSUBSCRIPT tip_contact end_POSTSUBSCRIPT ≥ 2 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 otherwise end_CELL start_CELL end_CELL end_ROW(6)

where ntip_contact=sum(c)subscript𝑛tip_contactsum𝑐n_{\text{tip\_contact}}=\text{sum}(c)italic_n start_POSTSUBSCRIPT tip_contact end_POSTSUBSCRIPT = sum ( italic_c ). This rewards the agent if the number of tip contacts is greater or equal to 2 to encourage stable grasping contacts.

Bad Contact Penalty:

rbc={1ifnnon_tip_contact00otherwisesubscript𝑟bccases1ifnnon_tip_contact0otherwise0otherwiseotherwiser_{\rm bc}=\begin{cases}1\quad\text{if $n_{\text{non\_tip\_contact}}\geq 0$}\\0\quad\text{otherwise}\\\end{cases}italic_r start_POSTSUBSCRIPT roman_bc end_POSTSUBSCRIPT = { start_ROW start_CELL 1 if italic_n start_POSTSUBSCRIPT non_tip_contact end_POSTSUBSCRIPT ≥ 0 end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 otherwise end_CELL start_CELL end_CELL end_ROW(7)

where nnon_tip_contactsubscript𝑛non_tip_contactn_{\text{non\_tip\_contact}}italic_n start_POSTSUBSCRIPT non_tip_contact end_POSTSUBSCRIPT is defined as the sum of all contacts with the object that is not a fingertip. We accumulate all the contacts in the simulation to calculate this.

Angular Velocity Penality:

rω=min(ωoωmax,0)subscript𝑟𝜔normsubscript𝜔𝑜subscript𝜔0r_{\omega}=-\min(||{\omega_{o}}||-\omega_{\max},0)italic_r start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = - roman_min ( | | italic_ω start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT | | - italic_ω start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT , 0 )(8)

where the maximum angular velocity ωmax=0.5subscript𝜔0.5\omega_{\max}=0.5italic_ω start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 0.5. This term penalises the agent if the angular velocity of the object exceeds the maximum.

Pose Penalty:

rpose=qq0subscript𝑟posenorm𝑞subscript𝑞0r_{\rm pose}=-||q-q_{0}||italic_r start_POSTSUBSCRIPT roman_pose end_POSTSUBSCRIPT = - | | italic_q - italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT | |(9)

where q0subscript𝑞0q_{0}italic_q start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT is the joint positions for some canonical grasping pose.

Work Penalty:

rwork=τTq¯subscript𝑟worksuperscript𝜏𝑇¯𝑞r_{\rm work}=-\tau^{T}\bar{q}italic_r start_POSTSUBSCRIPT roman_work end_POSTSUBSCRIPT = - italic_τ start_POSTSUPERSCRIPT italic_T end_POSTSUPERSCRIPT over¯ start_ARG italic_q end_ARG(10)

Torque Penalty:

rwork=τsubscript𝑟worknorm𝜏r_{\rm work}=-||\tau||italic_r start_POSTSUBSCRIPT roman_work end_POSTSUBSCRIPT = - | | italic_τ | |(11)

where in the above τ𝜏\tauitalic_τ is the torque applied to the joints during an actioned step.

Termination Penalty:

rterminate={1(kpdist>dmax)or(k^o>k^max)0otherwisesubscript𝑟terminatecases1𝑘subscript𝑝distsubscript𝑑orsubscript^𝑘𝑜subscript^𝑘otherwise0otherwiseotherwiser_{\rm terminate}=\begin{cases}-1\quad(k\,p_{\rm dist}>d_{\max})\ {\rm or}\ ({%\hat{k}_{o}>\hat{k}_{\max}})\\0\quad{\rm\ otherwise}\\\end{cases}italic_r start_POSTSUBSCRIPT roman_terminate end_POSTSUBSCRIPT = { start_ROW start_CELL - 1 ( italic_k italic_p start_POSTSUBSCRIPT roman_dist end_POSTSUBSCRIPT > italic_d start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) roman_or ( over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT > over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT ) end_CELL start_CELL end_CELL end_ROW start_ROW start_CELL 0 roman_otherwise end_CELL start_CELL end_CELL end_ROW(12)

Here we define two conditions to signify the termination of an episode. The first condition represents the object falling out of grasp, for which we use the maximum keypoint distance of dmax=0.1subscript𝑑0.1d_{\max}=0.1italic_d start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 0.1. The second condition represents the deviation of the object rotation axis from the target rotation axis (k^osubscript^𝑘𝑜\hat{k}_{o}over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT) beyond a maximum k^maxsubscript^𝑘\hat{k}_{\max}over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT. We use k^max=45subscript^𝑘superscript45\hat{k}_{\max}=45^{\circ}over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT.

The corresponding weights for each reward term is: λkp=1.0subscript𝜆kp1.0\lambda_{\rm kp}=1.0italic_λ start_POSTSUBSCRIPT roman_kp end_POSTSUBSCRIPT = 1.0, λrot=5.0subscript𝜆rot5.0\lambda_{\rm rot}=5.0italic_λ start_POSTSUBSCRIPT roman_rot end_POSTSUBSCRIPT = 5.0, λgoal=10.0subscript𝜆goal10.0\lambda_{\rm goal}=10.0italic_λ start_POSTSUBSCRIPT roman_goal end_POSTSUBSCRIPT = 10.0, λgc=0.1subscript𝜆gc0.1\lambda_{\rm gc}=0.1italic_λ start_POSTSUBSCRIPT roman_gc end_POSTSUBSCRIPT = 0.1, λbc=0.2subscript𝜆bc0.2\lambda_{\rm bc}=0.2italic_λ start_POSTSUBSCRIPT roman_bc end_POSTSUBSCRIPT = 0.2, λω=0.75subscript𝜆𝜔0.75\lambda_{\omega}=0.75italic_λ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = 0.75, λpose=0.2subscript𝜆pose0.2\lambda_{\rm pose}=0.2italic_λ start_POSTSUBSCRIPT roman_pose end_POSTSUBSCRIPT = 0.2, λwork=2.0subscript𝜆work2.0\lambda_{\rm work}=2.0italic_λ start_POSTSUBSCRIPT roman_work end_POSTSUBSCRIPT = 2.0, λtorque=1.0subscript𝜆torque1.0\lambda_{\rm torque}=1.0italic_λ start_POSTSUBSCRIPT roman_torque end_POSTSUBSCRIPT = 1.0, λpenalty=50.0subscript𝜆penalty50.0\lambda_{\rm penalty}=50.0italic_λ start_POSTSUBSCRIPT roman_penalty end_POSTSUBSCRIPT = 50.0.

B.2 Alternative Reward

We also formulate an alternative reward function consisting of an angular velocity reward and rotation axis penalty to compare with our auxiliary goal formulation.

Angular Velocity Reward:

rav=clip(ωk^,c2,c2)subscript𝑟avclip𝜔^𝑘subscript𝑐2subscript𝑐2r_{\rm av}=\text{clip}(\omega\cdot\hat{k},-c_{2},c_{2})\ italic_r start_POSTSUBSCRIPT roman_av end_POSTSUBSCRIPT = clip ( italic_ω ⋅ over^ start_ARG italic_k end_ARG , - italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT , italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT )(13)

where c2=0.5subscript𝑐20.5c_{2}=0.5italic_c start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT = 0.5.

Rotation Axis Penalty:

raxis=1k^k^ok^k^osubscript𝑟axis1^𝑘subscript^𝑘𝑜norm^𝑘normsubscript^𝑘𝑜r_{\rm axis}=1-\frac{\hat{k}\cdot\hat{k}_{o}}{||\hat{k}||||\hat{k}_{o}||}\ italic_r start_POSTSUBSCRIPT roman_axis end_POSTSUBSCRIPT = 1 - divide start_ARG over^ start_ARG italic_k end_ARG ⋅ over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT end_ARG start_ARG | | over^ start_ARG italic_k end_ARG | | | | over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT | | end_ARG(14)

where k^onormsubscript^𝑘𝑜||\hat{k}_{o}||| | over^ start_ARG italic_k end_ARG start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT | | is the current object rotation axis.

We form the new rrotationsubscript𝑟rotationr_{\rm rotation}italic_r start_POSTSUBSCRIPT roman_rotation end_POSTSUBSCRIPT reward rrotation=λavrav+λrotrrotsubscript𝑟rotationsubscript𝜆avsubscript𝑟avsubscript𝜆rotsubscript𝑟rotr_{\rm rotation}=\lambda_{\rm av}r_{\rm av}+\lambda_{\rm rot}r_{\rm rot}italic_r start_POSTSUBSCRIPT roman_rotation end_POSTSUBSCRIPT = italic_λ start_POSTSUBSCRIPT roman_av end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_av end_POSTSUBSCRIPT + italic_λ start_POSTSUBSCRIPT roman_rot end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_rot end_POSTSUBSCRIPT. We provide an additional object axis penalty λaxisraxissubscript𝜆axissubscript𝑟axis\lambda_{\rm axis}r_{\rm axis}italic_λ start_POSTSUBSCRIPT roman_axis end_POSTSUBSCRIPT italic_r start_POSTSUBSCRIPT roman_axis end_POSTSUBSCRIPT in the rstablesubscript𝑟stabler_{\rm stable}italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT term and remove the angular velocity penalty, λω=0subscript𝜆𝜔0\lambda_{\omega}=0italic_λ start_POSTSUBSCRIPT italic_ω end_POSTSUBSCRIPT = 0. The weights are λav=1.5subscript𝜆av1.5\lambda_{\rm av}=1.5italic_λ start_POSTSUBSCRIPT roman_av end_POSTSUBSCRIPT = 1.5 and λaxis=1.0subscript𝜆axis1.0\lambda_{\rm axis}=1.0italic_λ start_POSTSUBSCRIPT roman_axis end_POSTSUBSCRIPT = 1.0. We keep all other terms of the reward function the same.

B.3 Adaptive Reward Curriculum

The adaptive reward curriculum is implemented using a linear schedule of the reward curriculum coefficient λrew(rcontact+rstable)subscript𝜆rewsubscript𝑟contactsubscript𝑟stable\lambda_{\rm rew}(r_{\rm contact}+r_{\rm stable})italic_λ start_POSTSUBSCRIPT roman_rew end_POSTSUBSCRIPT ( italic_r start_POSTSUBSCRIPT roman_contact end_POSTSUBSCRIPT + italic_r start_POSTSUBSCRIPT roman_stable end_POSTSUBSCRIPT ) which increases with successive goals are reached per episode,

λrew=gevalgmingmaxgminsubscript𝜆rewsubscript𝑔evalsubscript𝑔𝑚𝑖𝑛subscript𝑔𝑚𝑎𝑥subscript𝑔𝑚𝑖𝑛\lambda_{\rm rew}=\frac{g_{\rm eval}-g_{min}}{g_{max}-g_{min}}italic_λ start_POSTSUBSCRIPT roman_rew end_POSTSUBSCRIPT = divide start_ARG italic_g start_POSTSUBSCRIPT roman_eval end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT end_ARG start_ARG italic_g start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT - italic_g start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT end_ARG(15)

where [gmin,gmax]subscript𝑔𝑚𝑖𝑛subscript𝑔𝑚𝑎𝑥[g_{min},g_{max}][ italic_g start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ] determines the ranges where the reward curriculum is active. This shifts the learning objective towards more realistic finger-gaiting motions as the contact and stability reward increases. We use [gmin,gmax]=[1.0,2.0]subscript𝑔𝑚𝑖𝑛subscript𝑔𝑚𝑎𝑥1.02.0[g_{min},g_{max}]=[1.0,2.0][ italic_g start_POSTSUBSCRIPT italic_m italic_i italic_n end_POSTSUBSCRIPT , italic_g start_POSTSUBSCRIPT italic_m italic_a italic_x end_POSTSUBSCRIPT ] = [ 1.0 , 2.0 ].

Appendix C Grasp Generation

To generate stable grasps, we initiate the object at 13cm above the base of the hand at random orientations and initialize the hand at a canonical grasp pose at the palm-up hand orientation. We then sample relative offset to the joint positions 𝒰(0.3,0.3)𝒰0.30.3\mathcal{U}(-0.3,0.3)caligraphic_U ( - 0.3 , 0.3 ) rad. We run the simulation by 120 steps (6 seconds) while sequentially changing the gravity direction from 6 principle axes of the hand (±xyz-axesplus-or-minus𝑥𝑦𝑧-axes\pm xyz\text{-axes}± italic_x italic_y italic_z -axes). We save the object orientation and joint positions (10000 grasp poses per object) if the following conditions are satisfied:
- The number of tip contacts is greater than 2.
- The number of non-tip contacts is zero
- Total fingertip to object distance is less than 0.2
- Object remains stable for the duration of the episode.

Appendix D Domain Randomization

In addition to the initial grasing pose, target rotation axis and hand orientation, we also include additional domain randomization during teacher and student training to improve sim-to-real performance (shown in Table4).

ObjectHand
Capsule Radius (m)[0.025, 0.034]PD Controller: Stiffness×𝒰(0.9,1.1)absent𝒰0.91.1\times\mathcal{U}(0.9,1.1)× caligraphic_U ( 0.9 , 1.1 )
Capsule Width (m)[0.000, 0.012]PD Controller: Damping×𝒰(0.9,1.1)absent𝒰0.91.1\times\mathcal{U}(0.9,1.1)× caligraphic_U ( 0.9 , 1.1 )
Box Width (m)[0.045, 0.06]Observation: Joint Noise0.03
Box Height (m)[0.045, 0.06]Observation: Fingertip Position Noise0.005
Mass (kg)[0.025, 0.20]Observation: Fingertip Orientation Noise0.01
Object: Friction10.0
Hand: Friction10.0Tactile
Center of Mass (m)[-0.01, 0.01]Observation: Pose Noise0.0174
Disturbance: Scale2.0Observation: Force Noise0.1
Disturbance: Probability0.25
Disturbance: Decay0.99

Appendix E System Identification

To reduce the sim-to-real gap of the allegro hand, we perform system identification to match the simulated robot hand with the real hand. We model each of the 16 DoF of the hand with the parameters; stiffness, damping, mass, friction, and armature, resulting in a total of 80 parameters to optimize. We collect corresponding trajectories in simulation and the real world in various hand orientations and use CMA-ES [55] to minimize the mean-squared error of the trajectories to find the best matching simulation parameters.

Appendix F Simulated Tactile Processing

To simulate our soft tactile sensor in a rigid body simulator, we process the received contact information from the simulator to make up the tactile observations. We use contact force information to compute binary contact signals:

c={1if𝐅>0.25N;0otherwise}𝑐1ifnorm𝐅0.25𝑁0otherwisec=\{1{\rm\ if\ }||\mathbf{F}||>0.25\,N;0{\rm\ otherwise}\}italic_c = { 1 roman_if | | bold_F | | > 0.25 italic_N ; 0 roman_otherwise }(16)

A contact force threshold of 0.25 N was selected to simulate the binary contact detection of the real sensor. For contact force information, we simulate sensing delay caused by elastic deformation of the soft tip in the real world by applying an exponential average on the received force readings.

F=αFt+(1α)Ft1F𝛼subscript𝐹𝑡1𝛼subscript𝐹𝑡1\textbf{F}=\alpha F_{t}+(1-\alpha)F_{t-1}F = italic_α italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT + ( 1 - italic_α ) italic_F start_POSTSUBSCRIPT italic_t - 1 end_POSTSUBSCRIPT(17)

where we use α=0.5𝛼0.5\alpha=0.5italic_α = 0.5. We then apply a saturation limit and re-scaling to align simulated contact force sensing ranges with the ranges experienced in the real world.

F=βFclip(F,Fmin,Fmax)Fsubscript𝛽𝐹clip𝐹subscript𝐹subscript𝐹\textbf{F}=\beta_{F}{\rm clip}(F,\ F_{\min},\ F_{\max})F = italic_β start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT roman_clip ( italic_F , italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_F start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT )(18)

where we use βF=0.6subscript𝛽𝐹0.6\beta_{F}=0.6italic_β start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT = 0.6, Fmin=0.0subscript𝐹0.0F_{\min}=0.0italic_F start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = 0.0 N, Fmax=5.0subscript𝐹5.0F_{\max}=5.0italic_F start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 5.0 N. We also apply the same saturation and rescaling factor for the contact pose.

P=βPclip(P,Pmin,Pmax)Psubscript𝛽𝑃clip𝑃subscript𝑃subscript𝑃\textbf{P}=\beta_{P}{\rm clip}(P,\ P_{\min},\ P_{\max})P = italic_β start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT roman_clip ( italic_P , italic_P start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT , italic_P start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT )(19)

where we use βP=0.6subscript𝛽𝑃0.6\beta_{P}=0.6italic_β start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT = 0.6, Pmin=0.53subscript𝑃0.53P_{\min}=-0.53italic_P start_POSTSUBSCRIPT roman_min end_POSTSUBSCRIPT = - 0.53 rad, Pmax=0.53subscript𝑃0.53P_{\max}=0.53italic_P start_POSTSUBSCRIPT roman_max end_POSTSUBSCRIPT = 0.53 rad. We use binary contact signals to mask contact pose and contact force observations to minimize noise in the tactile feedback. The same masking technique was applied in the real world.

Appendix G Architecture and Policy Training

The network architecture and training hyperparameters are shown in Table 5. The proprioception policy uses an observation input dimension of N=79𝑁79N=79italic_N = 79, the binary touch N=83𝑁83N=83italic_N = 83, and the full touch N=95𝑁95N=95italic_N = 95. We use a history of 30 time steps as input to the temporal convolutional network (TCN) and encode the privileged information into a latent vector of size n=8𝑛8n=8italic_n = 8 for all the policies.

TeacherStudent
MLP Input Dim18TCN Input Dim[30, N]
MLP Hidden Units[256, 128, 8]TCN Hidden Units[N, N]
MLP ActivationReLUTCN Filters[N, N, N]
Policy Hidden Units[512, 256, 128]TCN Kernel[9, 5, 5]
Policy ActivationELUTCN Stride[2, 1, 1]
Learning Rate5×1035superscript1035\times 10^{-3}5 × 10 start_POSTSUPERSCRIPT - 3 end_POSTSUPERSCRIPTTCN ActivationReLU
Num Envs8192Latent Vector Dim z𝑧zitalic_z8
Rollout Steps8Policy Hidden Units[512, 256, 128]
Minibatch Size32768Policy ActivationELU
Num Mini Epochs5Learning Rate3×1043superscript1043\times 10^{-4}3 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Discount0.99Num Envs8192
GAE τ𝜏\tauitalic_τ0.95Batch Size8192
Advantage Clip ϵitalic-ϵ\epsilonitalic_ϵ0.2Num Mini Epochs1
KL Threshold0.02OptimizerAdam
Gradient Norm1.0Goal Update dtolsubscript𝑑told_{\rm tol}italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT0.25
OptimizerAdam
Goal Update dtolsubscript𝑑told_{\rm tol}italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT0.15

Appendix H Tactile Observation Model

Data Collection. The setup for tactile feature extraction is shown in Figure 10. We collect data by tapping and shearing the sensor on a flat stimulus fixed onto a force torque sensor and collect six labels for training: contact depth z𝑧zitalic_z, contact pose in Rxsubscript𝑅𝑥R_{x}italic_R start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, contact pose in Rysubscript𝑅𝑦R_{y}italic_R start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT, and contact forces Fxsubscript𝐹𝑥F_{x}italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, Fysubscript𝐹𝑦F_{y}italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT and Fzsubscript𝐹𝑧F_{z}italic_F start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT. In order to capture sufficient contact features needed for the in-hand object rotation task, we sample the sensor poses with the ranges shown in Table 6, which provides sensing ranges for contact pose between [28,28]superscript28superscript28[-28^{\circ},28^{\circ}][ - 28 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT , 28 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT ] and contact force of up to 5 N.

Training. The architecture and training parameters of the observation model are shown in Table7. For each fingertip sensor, we collect 3000 images (2400 train and 600 test) and train separate models. The prediction error for one of the sensors is shown in Figure11.

Pose ComponentSampled range
Depth z𝑧zitalic_z (mm)[-1, -4]
Shear Sxsubscript𝑆𝑥S_{x}italic_S start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (mm)[-2, -2]
Shear Sysubscript𝑆𝑦S_{y}italic_S start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT (mm)[-2, -2]
Rotation Rxsubscript𝑅𝑥R_{x}italic_R start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT (deg)[-28, 28]
Rotation Rysubscript𝑅𝑦R_{y}italic_R start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT (deg)[-28, 28]
Observation Model
Conv Input Dim[240, 135]
Conv Filters[32, 32, 32, 32]
Conv Kernel[11, 9, 7, 5]
Conv Stride[1, 1, 1, 1]
Max Pooling Kernal[2, 2, 2, 2]
Max Pooling Stride[2, 2, 2, 2]
Output Dim6
Batch NormalizationTrue
ActivationReLU
Learning Rate1×1041superscript1041\times 10^{-4}1 × 10 start_POSTSUPERSCRIPT - 4 end_POSTSUPERSCRIPT
Batch Size16
Num Epochs100
OptimizerAdam

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (8)

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (9)

Appendix I Tactile Image Processing

The tactile sensors provide raw RGB images from the camera module. We use an exposure setting of 312.5 and a resolution of 640×480640480640\times 480640 × 480, providing a frame rate of up to 30 FPS. The images are then postprocessed to compute tactile observations. For all tactile observations, we convert the raw image to greyscale and resale the dimension to 240×135240135240\times 135240 × 135.
Binary Contact: We further apply a medium blur filter with an aperture linear size of 11, followed by an adaptive threshold with a block size of 55 pixels and a constant offset value of -2. These operations improve the smoothness of the image and filter out unwanted noise. The postprocessed image is compared with a reference image using the Structural Similarity Index (SSIM) to compute binary contact (0 or 1). We use an SSIM threshold of 0.6 for contact detection.
Contact Pose and Force: We directly use the resized greyscale image for contact force and pose prediction. From the target labels, we use contact pose (Rxsubscript𝑅𝑥R_{x}italic_R start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, Rysubscript𝑅𝑦R_{y}italic_R start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT) and the contact force components Fxsubscript𝐹𝑥F_{x}italic_F start_POSTSUBSCRIPT italic_x end_POSTSUBSCRIPT, Fysubscript𝐹𝑦F_{y}italic_F start_POSTSUBSCRIPT italic_y end_POSTSUBSCRIPT, Fzsubscript𝐹𝑧F_{z}italic_F start_POSTSUBSCRIPT italic_z end_POSTSUBSCRIPT (to compute the contact force magnitude Fnorm𝐹||F||| | italic_F | |) to construct the dense tactile representation used during policy training. We use the binary contact signal to mask contact pose and force, thresholding the predictions at 0.25Nabsent0.25𝑁\approx 0.25N≈ 0.25 italic_N.

Appendix J Real-world Deployment

Tactile Sensor Design. This design of the sensor is based on the DigiTac version[39] of the TacTip[37, 58], a soft optical tactile sensor that provides contact information through marker-tipped pin motion under its sensing surface. Here, we have redesigned the DIGIT base to be more compact with a new PCB board, modular camera and lighting system (Figure13). We also improved the morphology of the skin and base connector to provide a larger and smoother sensing surface for greater fingertip dexterity. The tactile sensor skin and base are entirely 3D printed with Agilus 30 for skin and vero-series for the markers on the pin-tips and for the casings. Each base contains a camera driver board that connects to the computer via a USB cable and can be streamed asynchronously at a frame rate of 30 FPS. We perform post-processing using OpenCV[59] in real-time.

Sensor Placement. As the tactile fingertips are primarily sensorized over a front-facing area, we experimented with different orientations relative to the fingers and tried to maximize contact sensing during in-hand object rotation. We placed the tactile fingertips with offsets (thumb, index, middle, ring) = (45superscript45-45^{\circ}- 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, 45superscript45-45^{\circ}- 45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, 0superscript00^{\circ}0 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT, 45superscript4545^{\circ}45 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT).

Control Pipeline. Each tactile observation model is deployed together with the policy as shown in Figure 13. We stream tactile and proprioception readings asynchronously at 20 Hz. The joint positions are used by a forward kinematic solver to compute fingertip position and orientation. The relative joint positions obtained from the policy are converted to target joint commands. This is published to the Allegro Hand and converted to torque commands by a PD controller at 300 Hz.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (10)
AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (11)

Object Properties. Various physical properties of the objects used in the real-world experiment are shown in Table 8. We include objects of different sizes and shapes not seen during training.

Real-world Object Set
Dimensions (mm)Mass (g)Dimensions (mm)Mass (g)
Plastic Apple75×75×7075757075\times 75\times 7075 × 75 × 7060Tin Cylinder45×45×6345456345\times 45\times 6345 × 45 × 6330
Plastic Orange70×72×7270727270\times 72\times 7270 × 72 × 7252Cube51×51×5151515151\times 51\times 5151 × 51 × 5165
Plastic Pepper61×68×6561686561\times 68\times 6561 × 68 × 6510Gum Box90×80×7690807690\times 80\times 7690 × 80 × 7689
Plastic Peach62×56×5562565562\times 56\times 5562 × 56 × 5530Container90×80×7690807690\times 80\times 7690 × 80 × 7632
Plastic Lemon52×52×6552526552\times 52\times 6552 × 52 × 6533Rubber Toy80×53×4880534880\times 53\times 4880 × 53 × 4827

Appendix K Additional Experiments

K.1 Hyperparamters

We provide additional ablation studies to analyze the design choices for our axillary goal formulation. The effect of goal update tolerance dtolsubscript𝑑told_{\rm tol}italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT for the student training and the goal increment intervals are shown in Table9.

The performance can be significantly affected by the goal-update tolerance. As the tolerance reduced during student training, the number of average rotations and successive goals reached per episode also reduced. This suggests that the performance of the teacher policy was poorly transferred and the student could not learn the multi-axis object rotation skill effectively. Increasing the goal increment intervals also resulted in fewer rotations achieved.

Goal Update ToleranceRotTTT(s)#Success
dtol=0.15subscript𝑑tol0.15d_{\rm tol}=0.15italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT = 0.150.7528.13.07
dtol=0.20subscript𝑑tol0.20d_{\rm tol}=0.20italic_d start_POSTSUBSCRIPT roman_tol end_POSTSUBSCRIPT = 0.201.3627.74.48
𝒅𝐭𝐨𝐥=0.25subscript𝒅𝐭𝐨𝐥0.25\boldsymbol{d_{\rm tol}=0.25}bold_italic_d start_POSTSUBSCRIPT bold_tol end_POSTSUBSCRIPT bold_= bold_0.251.7727.25.26
Goal Increment IntervalRotTTT(s)#Success
θ=50𝜃superscript50\theta=50^{\circ}italic_θ = 50 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT1.3027.13.86
θ=40𝜃superscript40\theta=40^{\circ}italic_θ = 40 start_POSTSUPERSCRIPT ∘ end_POSTSUPERSCRIPT1.5026.74.36
𝜽=𝟑𝟎𝜽superscript30\boldsymbol{\theta=30^{\circ}}bold_italic_θ bold_= bold_30 start_POSTSUPERSCRIPT bold_∘ end_POSTSUPERSCRIPT1.7727.25.26

K.2 Rotating Hand

We test the robustness of the policy by performing in-hand object rotation during different hand movements. In particular, we choose hand trajectories where the gravity vector is continuously changing relative to the orientation of the hand, adding greater complexity to this task. Rollouts for three different hand trajectories are shown in Figure 14. In particular, for the third hand trajectory (iii), we demonstrate the capability of the robot hand to servo around the surface of the object in different directions while keeping the object almost stationary in free space. This motion also demonstrates the ability to command different target rotation axes during deployment, offering a useful set of primitives for other downstream tasks.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (12)
AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (2024)

References

Top Articles
Latest Posts
Article information

Author: Jeremiah Abshire

Last Updated:

Views: 6024

Rating: 4.3 / 5 (54 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Jeremiah Abshire

Birthday: 1993-09-14

Address: Apt. 425 92748 Jannie Centers, Port Nikitaville, VT 82110

Phone: +8096210939894

Job: Lead Healthcare Manager

Hobby: Watching movies, Watching movies, Knapping, LARPing, Coffee roasting, Lacemaking, Gaming

Introduction: My name is Jeremiah Abshire, I am a outstanding, kind, clever, hilarious, curious, hilarious, outstanding person who loves writing and wants to share my knowledge and understanding with you.