AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (2024)

Max Yang¹, Chenghua Lu¹, Alex Church², Yijiong Lin¹, Chris Ford¹, Haoran Li¹,
Efi Psomopoulou¹, David A.W. Barton¹^*, Nathan F. Lepora¹^*
¹Univerisity of Bristol ²Cambrian Robotics
https://maxyang27896.github.io/anyrotate/

Abstract

Human hands are capable of in-hand manipulation in the presence of different hand motions. For a robot hand, harnessing rich tactile information to achieve this level of dexterity still remains a significant challenge. In this paper, we present AnyRotate, a system for gravity-invariant multi-axis in-hand object rotation using dense featured sim-to-real touch. We tackle this problem by training a dense tactile policy in simulation and present a sim-to-real method for rich tactile sensing to achieve zero-shot policy transfer. Our formulation allows the training of a unified policy to rotate unseen objects about arbitrary rotation axes in any hand direction. In our experiments, we highlight the benefit of capturing detailed contact information when handling objects with varying properties. Interestingly, despite not having explicit slip detection, we found rich multi-fingered tactile sensing can implicitly detect object movement within grasp and provide a reactive behavior that improves the robustness of the policy.

^†^†footnotetext: ^* These authors contributed equally.^†^†footnotetext: Correspondence to max.yang@bristol.ac.uk.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (1)

Keywords: Tactile Sensing, In-hand Object Rotation, Reinforcement Learning

1 Introduction

The versatility of manipulating objects of varying shapes and sizes has been a long-standing goal for robot manipulation[1]. However, in-hand manipulation with multi-fingered hands can be hugely challenging due to the high degree of actuation, fine motor control, and large environment uncertainties. While significant advances have been made in recent years, most prominently the work by OpenAI[2, 3], they have relied primarily on vision-based systems which are not necessarily well suited to this task due to significant self-occlusion. Overcoming these issues often requires multiple cameras and complicated setups that are not representative of natural embodiment.

More recently, researchers have begun to explore the object rotation problem with proprioception and touch sensing[4, 5], treating it as a representative task of general in-hand manipulation. The ability to rotate objects around any chosen axis in any hand orientation displays a useful set of primitives that is crucial for manipulating objects freely in space, even while the hand is in motion. However, this can be challenging as it requires comprehension of the complex interaction physics that cannot be visually observed coupled with high-precision control of secure grasps in the presence of gravity (i.e. gravity invariant): it is harder to hold an object while manipulating it if the palm is not facing upwards, which has been the scenario in related work[2, 3, 4, 5]. Tactile sensing is expected to play a key role here as it enables the capture of detailed contact information to better control the robot-object interaction. However, rich tactile sensing for in-hand dexterous manipulation has not yet been fully exploited due to the large sim-to-real gap, often leading to a reduction of high-resolution tactile data to low-dimensional representations [6, 7]. One might expect that a more detailed tactile representation could increase in-hand dexterity and enable new tasks.

In this paper, we introduce AnyRotate: a robot system for performing multi-axis gravity-invariant in-hand object rotation with dense featured sim-to-real touch. Here, we propose to tackle this task with goal-conditioned RL and rich tactile sensing. We first present our sim-to-real formulation and dense tactile representation to train an accurate and precise policy for multi-axis object rotation. We then train an observation model to simultaneously predict continuous contact pose and contact force readings from tactile images to capture important features for precise manipulation under noisy conditions. In the real world, we mount tactile sensors onto the fingertips of a four-fingered fully-actuated robot hand to provide rich tactile feedback for performing stable in-hand object rotation.

Our principal contributions are, in summary:
1) An RL formulation using auxiliary goals for end-to-end learning of a unified policy, capable of achieving multi-axis in-hand object rotation in arbitrary hand orientation relative to gravity.
2) A dense tactile representation for learning in-hand manipulation. We highlight the benefit of acquiring detailed contact information for handling unseen objects with various physical properties.
3) We achieve zero-shot sim-to-real policy transfer and validate on 10 diverse objects in the real world. Our policy demonstrates strong robustness across various hand directions and rotation axes, and maintains high performance even when deployed on a rotating hand.

2 Related Work

Classical Control. Due to the complexity of the contact physics in dexterous manipulation, work on this topic has traditionally relied on planning and control using simplified models[8, 9, 10, 11, 12, 13, 14, 15]. With improved hardware and design, these methods have continued to demonstrate an increased level of dexterity[16, 17, 18, 19, 20, 21, 22, 23, 24]. While these methods offer performance guarantees, they are often limited by the underlying assumptions.

Dexterous Manipulation. With advances of machine learning, learning-based control has become a popular approach to achieving dexterity [3, 2, 25, 26]. However, most prior works rely on vision as the primary sense for object manipulation[27, 28, 29, 30], which requires continuous tracking of the object in a highly dynamic scene where occlusions could lead to poorer performance. Vision also has difficulty capturing local contact information which may be crucial for contact-rich tasks.

More recently, researchers have explored the in-hand object rotation task using proprioception and touch sensing[4, 7, 5]. This has so far been limited to rotation about the primary axes or training separate policies for arbitrary rotation axes with an upward facing hand[31, 6]. In-hand manipulation in different hand orientations can be challenging as the hand must perform finger-gaiting while keeping the object stable against gravity. Several works[32, 33, 34, 35] achieved manipulation with a downward-facing hand using either a gravity curriculum or precision grasp manipulation, but the policies were still limited to a single hand orientation. In this work, we make significant advancements to train a unified policy to rotate objects about any chosen rotation axes in any hand direction, and for the first time achieve in-hand manipulation with a continuously moving and rotating hand.

Tactile Sensing. Vision-based tactile sensors have become increasingly popular due to their affordability, compactness, and ability to provide precise and detailed spatial information about contact through high-resolution tactile images[36, 37, 38, 39]. However, this fine-grained local contact information has not yet been fully utilized for in-hand dexterous manipulation. Previous studies have reduced the high-resolution tactile images to binary contact[7] or discretized contact location[6] to reduce the sim-to-real gap. In contrast, our system utilizes a dense featured tactile representation consisting of the full contact pose and contact force. We show that this tactile representation can capture important interaction physics that is valuable for dexterous manipulation under unknown disturbances, outperforming prior baselines.

Sim-to-real Methods: Learning in simulation for tactile robotics has gained appeal as it avoids the practical limitations of large data collection in real-world interactions. This trend has been driven by advancements in high-fidelity tactile simulators[40, 41, 42, 43], as well as various sim-to-real approaches[44, 45, 46, 47, 48]. Several works have proposed using high-frequency rendering of tactile images for sim-to-real RL[44, 49, 50]. However, this can be computationally expensive and inefficient, limiting these methods to simpler robotic systems. In this work, we extend the sim-to-real framework of Yang et al.[51] by proposing an approach to predict full contact pose and contact force and apply it to a dexterous manipulation task with a robot hand.

3 Method

We perform in-hand object rotation via stable precision grasping, constituting a process for continuous object rotation without a supporting surface. Gravity invariance is considered by randomly initializing hand orientations between episodes. This is a difficult exploration problem whereby the movement of the fingers and object has a constant requirement of maintaining stability, as any finger misplacement can induce slip and lead to an irreversible state where the object is dropped. To obtain a general policy for multi-axis in-hand object rotation, we formulate the object-rotation problem as object reorientation and adopt a two-stage learning process. First, the teacher is trained with privileged information and reinforcement learning[52]. We use an auxiliary goal formulation and adaptive curriculum to achieve sample-efficient training. The student is then trained via supervised learning to imitate the teacher’s policy given only real-world observations. During both stages, we provide the agent with rich tactile feedback. To bridge the sim-to-real gap for rich tactile sensing, we collect contact data to train an observation model, which allows for zero-shot policy transfer to the real world. An overview of the method is shown in Figure2 and 3.

3.1 Multi-axis In-hand Object Rotation

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (2)

We formulate the task as a finite horizon goal-conditioned Markov Decision Process (MDP) $\mathcal{M}=(\mathcal{S},\mathcal{A},\mathcal{R},\mathcal{P},\mathcal{G})$ , defined by a continuous state $s\in\mathcal{S}$ , a continuous action space $a\in\mathcal{A}$ , a probabilistic state transition function $p(s_{t+1}|s_{t},a_{t})\in\mathcal{P}$ , a goal $g\in\mathcal{G}$ and a reward function $r\in\mathcal{R}:\mathcal{S}\times\mathcal{A}\times\mathcal{G}\xrightarrow{}%\mathbb{R}$ . At each time step $t$ , a learning agent selects an action $a_{t}$ from the current policy $\pi(a_{t}|s_{t},g)$ and receives a reward $r$ . The aim is to obtain a policy $\pi_{\theta}^{*}$ parameterized by $\theta$ that maximizes the expected return $\mathbb{E}_{\tau\sim p_{\pi}(\tau),\,g\sim q(g)}\left[\sum_{t=0}^{T}\gamma^{t}%r(s_{t},a_{t},g)\right]$ over an episode $\tau$ .

Observations. The observation $O_{t}$ contains the current and target joint position $q_{t},\bar{q}_{t}\in\mathbb{R}^{16}$ , previous action $a_{t-1}\in\mathbb{R}^{16}$ , fingertip position $f^{p}_{t}\in\mathbb{R}^{12}$ , fingertip orientation $f^{r}_{t}\in\mathbb{R}^{16}$ , binary contact $c_{t}\in\{0,1\}^{4}$ , contact pose $P_{t}\in\mathbb{S}^{4}$ , contact force magnitude $F_{t}\in\mathbb{R}^{4}$ , and the desired rotation axis $\hat{k}\in\mathbb{S}^{2}$ . The privileged information provided to the teacher includes object position, orientation, angular velocity, dimensions, gravity force vector, and the current goal orientation.

Action Space. At each time step, the action output from the policy is $a_{t}:=\Delta\theta\in\mathbb{R}^{16}$ , the relative joint positions of the robot hand. To encourage smooth finger motion, we apply an exponential moving average to compute the target joint positions defined as $\bar{q}_{t}=\bar{q}_{t-1}+\tilde{a}_{t}$ , where $\tilde{a}_{t}=\eta a_{t}+(1-\eta)a_{t-1}$ . We control the hand at 20 Hz and limit the action to $\Delta\theta\in[-0.026,0.026]^{16}$ rad.

Simulated Touch. We approximate the sensor as a rigid body and fetch the contact information from its sensing surface; the local contact position ( $c_{x},c_{y},c_{z}$ ) for computing contact pose, and the net contact force $(F_{x},F_{y},F_{z})$ for computing contact force magnitude. We apply an exponential moving average on the contact force readings to simulate sensing delay due to elastic deformation. We also saturate and re-scale the contact values to the sensing ranges experienced in reality. Contact force is used to compute binary contact signals using a threshold similar to the real sensor.

Auxiliary Goal. For training a unified policy for multi-axis rotation, a formulation using angular velocity will lead to inefficient training and convergence difficulties, as will be shown in Section5.1. Instead, we formulate the problem as object reorientation to a moving target. Targets are generated by rotating the current object orientation about the desired rotation axis in regular intervals. When a target is reached, a new one is generated about the rotation axis until the episode ends.

Reward Design. In the following, we provide an intuitive explanation of the goal-based reward used for learning multi-axis object rotation (with full details in AppendixB):

r=r_{\rm rotation}+r_{\rm contact}+r_{\rm stable}+r_{\rm terminate}\\

(1)

The object rotation objective is defined by $r_{\rm rotation}$ . We use a keypoint formulation $\mathcal{K}(||k^{\rm o}_{i}-k^{\rm g}_{i}||)$ to define target poses[29] and apply keypoint distance threshold to provide a goal update tolerance $d_{\rm tol}$ . We augment this reward with a sparse bonus reward when a goal is reached and a delta rotation reward to encourage continuous rotation. Next, we use $r_{\rm contact}$ to maximize contact sensing which rewards tip contacts and penalizes contacts with any other parts of the hand. We also include several terms to encourage stable rotations $r_{\rm stable}$ comprising: an object angular velocity penalty; a hand-pose penalty on the distance between the joint position from a canonical pose; a controller work-done penalty; and a controller torque penalty. Finally, we include an early termination penalty $r_{\rm terminate}$ , if the object falls out of the grasp or the rotation axis deviates too far from the desired axis.

Adaptive Curriculum. The precision-grasp object rotation task can be separated into two key phases of learning: first to stably grasp the object in different hand orientations, then to rotate objects stably about the desired rotation axis. Whilst the $r_{\rm contact}$ and $r_{\rm stable}$ reward terms are beneficial for the sim-to-real transfer of the final policy, these terms can hinder the learning process, resulting in local optima where the object will be stably grasped without being rotated. To alleviate this issue, we apply a reward curriculum coefficient $\lambda_{\rm rew}(r_{\rm contact}+r_{\rm stable})$ , which increases linearly with the average number of rotations achieved per episode.

3.2 Teacher-Student Policy Distillation

The training in Section 3.1 uses privileged information, such as object properties and auxiliary goal pose. Similar to previous work[5, 6], we use policy distillation to train a student that only relies on proprioception and tactile feedback. The student policy has the same actor-critic architecture as the teacher policy $a_{t}=\pi_{\theta}(O_{t},a_{t-1},z_{t})$ and returns a Gaussian distribution with diagonal covariances $a_{t}\equiv\mathcal{N}(\mu_{\theta},\Sigma_{\theta})$ . The latent vector $z_{t}=\phi(O_{t},O_{t-1},...,O_{t-n})$ is the predicted low dimensional encoding from a sequence of $N$ proprioceptive and tactile observations. We use a temporal convolutional network (TCN) encoder for the latent vector function $\phi(.)$ .

Training. The student encoder is randomly initialized and the policy network is initialized with the weights from the teacher policy. We train both the encoder and policy network via supervised learning, minimizing the mean squared error (MSE) of the latent vectors $z_{t}$ and $\bar{z_{t}}$ and negative log-likelihood loss (NLL) of the action distributions $a_{t}$ and $\bar{a_{t}}$ . Without explicit object or goal information, we found the student policy unable to achieve the same level of goal-reaching accuracy as the teacher, which can lead to missing the goal and collecting out-of-distribution data. To alleviate this issue, we increase the goal update tolerance $d_{\rm tol}$ during student training.

3.3 Dense Featured Touch

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (3)

The dense tactile observations consist of contact pose and contact force. We use spherical coordinates defined by the contact pose variables: polar angle $R_{x}$ and azimuthal angle $R_{y}$ . The contact force variable is the magnitude of the 3D contact force $||F||$ . For sim-to-real transfer, we train an observation model to extract features from tactile images and perform zero-shot policy transfer[51].

Data Collection. We use a 6-DoF UR5 robot arm with the tactile sensor attached to the end effector and a F/T sensor placed on the workspace platform. The tactile sensor is moved on the surface of the flat stimulus mounted above the F/T sensor at randomly sampled poses. For each interaction, we store tactile images with the corresponding pose and force labels. We then train a CNN model to extract these explicit features of contact from tactile images. More details are given in AppendixH.

Deployment. We apply a Structured Similarity Index (SSIM) threshold between the current and the reference tactile image to compute binary contact, which is also used to mask contact pose and force predictions. Given tactile images on each fingertip, we use the observation models to obtain the dense contact features which are then used as tactile observations for the policy. An overview of the tactile prediction pipeline is shown in Figure3.

4 System Setup

Real-world. We use a 16-DoF Allegro Hand with finger-like front-facing vision-based tactile sensors attached to each of its fingertips. Each sensor can be streamed asynchronously along with the joint positions from the hand. The target joint commands are sent with a control rate of 20 Hz. The hand is attached to the end effector of a UR5 to provide different hand orientations for performing in-hand object rotation, as shown in Figure1.

Simulation. We use IsaacGym [53] for training the teacher and student policies. Each environment contains a simulated Allegro Hand with tactile sensors attached on each fingertip. Gravity is enabled for both the hand and the object. We perform system identification on simulation parameters in various hand directions to reduce the sim-to-real gap (detailed in AppendixE). We run the simulation at $dt=1/60s$ and policy control at 20 Hz.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (4)

Object Set. We use fundamental geometric shapes in Isaac Gym (capsule and box) for training. In simulation, we test on two out-of-distribution (OOD) object sets (see Figure4): 1) OOD Mass, training objects with heavier mass; 2) OOD shape, selection of unseen objects with different shapes. In the real world, we select 10 objects with different properties (see Table8) to test generalizability of the policy.

Evaluation We run each experiment for 600 steps (equating to 30 seconds) and use the following metrics for evaluation:
(i) Rotation Count (Rot) - the total number of rotations about the desired axis achieved per episode. In the real world, this is manually counted using reference markers attached to the object (visible as the tape in Figure4).
(ii) Time to Terminate (TTT) - time is taken before the object gets stuck, falls out of grasp, or if the rotation axis has deviated away from the target.

5 Experiments and Analysis

First we investigate our auxiliary goal formulation and adaptive curriculum for learning the multi-axis object rotation task (Section 5.1). Then we study the importance of rich tactile sensing for learning this dexterous manipulation task and conduct a quantitative analysis on the generalizability of the learned polices (Section 5.2). Finally, using the proposed sim-to-real approach, we deploy the policies in the real world on a range of different object rotation tasks (Section 5.3).

ObservationOOD MassOOD ShapeRotEpLen(s)RotEpLen(s)Fixed Hand Orn $0.55_{\pm 0.06}$ $11.8_{\pm 0.2}$ $0.55_{\pm 0.04}$ $19.1_{\pm 0.5}$ Proprio $1.34_{\pm 0.07}$ $21.5_{\pm 0.5}$ $0.82_{\pm 0.02}$ $25.1_{\pm 0.3}$ Binary Touch $1.90_{\pm 0.04}$ $20.8_{\pm 0.5}$ $1.57_{\pm 0.05}$ $25.3_{\pm 0.2}$ Discrete Touch $1.95_{\pm 0.15}$ $22.2_{\pm 0.4}$ $1.67_{\pm 0.08}$ $26.5_{\pm 0.1}$ Dense Touch w/o Pose $2.05_{\pm 0.04}$ $22.0_{\pm 0.8}$ $1.60_{\pm 0.02}$ $25.5_{\pm 0.4}$ Dense Touch w/o Force $2.05_{\pm 0.05}$ $21.9_{\pm 0.1}$ $1.73_{\pm 0.03}$ $26.7_{\pm 0.0}$ Dense Touch $\boldsymbol{2.18_{\pm 0.05}}$ $\boldsymbol{22.8_{\pm 0.8}}$ $\boldsymbol{1.77_{\pm 0.01}}$ $\boldsymbol{27.2_{\pm 0.3}}$

5.1 Training Performance

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (5)

We compare our auxiliary goal formulation against one that uses an angular rotation objective (w/o auxiliary goal), a common formulation of object rotation as shown in prior works[5, 6, 31]. We also compare learning without adaptive curriculum (w/o curriculum). More details of the baselines can be found in the Appendices B.2 and B.3. The learning curves for average rotation and successive goals reached are shown in Figure6. While the agent can learn to rotate objects in the single-axis setting using an angular rotation objective, it resulted in much lower accuracy, obtaining near-zero successive goals reached, despite having a rotation axis penalty. In the multi-axis setting, the training was unsuccessful and the agent was unable to maintain stable rotation. The agent also failed to learn without the adaptive reward curriculum. The resulting policy shows the learning getting stuck where the agent keeps the object stable without rotating it. This presents a difficult exploration problem and without the guidance of an adaptive curriculum, the agent cannot escape this local optima.

5.2 Simulation Results

The results for multi-axis rotation in arbitrary hand orientations are shown in Table5. First we observe that a policy trained in a fixed hand orientation performed poorly in arbitrary hand orientations, suggesting gravity invariance adds considerable complexity to the task despite performing precision grasp manipulation. Table1 compares our dense touch policy (contact pose and contact force) with policies trained with proprioception, binary touch, and discrete touch (a discretized representation introduced in [6]).

Contrary to the findings in [6], we find binary touch to be beneficial over proprioception alone. In our case, we attribute this to including binary contact information during teacher training, which provides a better base policy. Overall, we found that performance improved with more detailed tactile sensing. The dense touch policy, trained with information regarding contact pose and force, outperformed policies that used simpler, less detailed touch. Moreover, discretizing the contact location led to a drop in performance compared with contact pose, suggesting that this type of representation is not as well suited to the morphology of our front-facing sensor. The ablation studies showed that contact force can provide useful information regarding the interaction physics, which improves the performance when handling objects with different mass properties; in addition, excluding either feature of dense touch resulted in suboptimal performance.

5.3 Real-world Results

ObservationPalm UpPalm DownBase UpBase DownThumb UpThumb DownRotTTT(s)RotTTT(s)RotTTT(s)RotTTT(s)RotTTT(s)RotTTT(s)Proprio $1.47_{\pm 0.69}$ $27.6_{\pm 3.8}$ $1.05_{\pm 0.37}$ $25.3_{\pm 4.0}$ $0.84_{\pm 0.30}$ $26.8_{\pm 3.6}$ $0.87_{\pm 0.46}$ $22.8_{\pm 9.6}$ $0.78_{\pm 0.53}$ $20.3_{\pm 9.9}$ $0.51_{\pm 0.65}$ $9.50_{\pm 8.9}$ Binary Touch $1.32_{\pm 0.52}$ $25.5_{\pm 6.5}$ $0.89_{\pm 0.28}$ $23.8_{\pm 4.6}$ $0.86_{\pm 0.32}$ $25.3_{\pm 6.2}$ $0.77_{\pm 0.28}$ $23.0_{\pm 4.7}$ $0.83_{\pm 0.49}$ $22.6_{\pm 9.0}$ $0.47_{\pm 0.32}$ $13.2_{\pm 5.7}$ Dense Touch $\boldsymbol{1.57_{\pm 0.57}}$ $\boldsymbol{30.0_{\pm 0.0}}$ $\boldsymbol{1.33_{\pm 0.44}}$ $\boldsymbol{28.2_{\pm 3.1}}$ $\boldsymbol{1.32_{\pm 0.32}}$ $\boldsymbol{29.8_{\pm 0.6}}$ $\boldsymbol{1.17_{\pm 0.38}}$ $\boldsymbol{29.4_{\pm 1.8}}$ $\boldsymbol{1.08_{\pm 0.47}}$ $\boldsymbol{27.9_{\pm 3.1}}$ $\boldsymbol{0.91_{\pm 0.33}}$ $\boldsymbol{29.2_{\pm 2.0}}$

Object rotation performance for various hand orientations and rotation axes are given in Tables 1 and 7. In both cases, the dense touch policy performed the best, demonstrating a successful transfer of the dense tactile observations. The proprioception and binary touch policies were less effective at maintaining stable rotation, often resulting in loss of contact or getting stuck.

Hand orientations. The performance dropped as the hand directions changed from palm up and palm down, followed by base up and base down, to the thumb up and thumb down directions. We attribute this to the larger sim-to-real gap when fingers are positioned horizontally during manipulation. In the latter cases, the gravity loading of the fingers acts against actuation, which weakens the hand in those orientations. However, despite the noisy system, a policy provided with rich tactile information consistently demonstrated stable performance. Examples are shown in Figure 9.

Observationx-axisy-axisz-axisRotTTT(s)RotTTT(s)RotTTT(s)Proprio $0.35_{\pm 0.33}$ $16.6_{\pm 12.6}$ $0.17_{\pm 0.19}$ $8.33_{\pm{8.5}}$ $1.05_{\pm 0.37}$ $25.3_{\pm 4.0}$ Binary Touch $0.87_{\pm 0.43}$ $26.5_{\pm 5.4}$ $0.25_{\pm 0.18}$ $15.9_{\pm 10.5}$ $0.89_{\pm 0.28}$ $23.8_{\pm 4.6}$ Dense Touch $\boldsymbol{1.33_{\pm 0.50}}$ $\boldsymbol{28.6_{\pm 2.8}}$ $\boldsymbol{0.79_{\pm 0.37}}$ $\boldsymbol{27.8_{4.8}}$ $\boldsymbol{1.33_{\pm 0.44}}$ $\boldsymbol{28.2_{\pm 3.1}}$

Rotation Axis. Rotation about $z$ -axis was the easiest to achieve, followed by the $x$ - and $y$ -axes. We noticed that binary touch had similar results to proprioception when rotating about the $z$ -axis, but performed better for $x$ - and $y$ -rotation axes. The latter axes require two fingers to hold the object steady (middle/thumb or index/pinky) while the remaining two fingers provide a stable rotating motion. This requires more sophisticated finger-gaiting, and the policy struggled to perform well with proprioception alone.

Tactile sensor Response. An analysis of the processed tactile sensor outputs during a rollout is shown Figure9. The two key motions for stable object rotation can be seen in the output pose and force. Given rich tactile sensing on a multi-fingered hand, the policy can detect when the object is slipping out of stable grasp to provide reactive finger-gaiting motions that prevent the object from slipping further. This emergent behavior was not seen when using proprioception or binary touch.

Gravity Invariance. We also demonstrate that the trained policy can adapt effectively to a rotating hand, where the gravity vector is continuously changing in the hand’s frame of reference. Examples of three hand trajectories are provided in the AppendixK.2 and the accompanying Supplementary Video. This capability to manipulate objects during angular movements of the hand enables 6D reorientation of the object while simultaneously repositioning the grasp location. This gives a new level of dexterity for robot hands that could be beneficial in many tasks, e.g. general pick-and-place.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (6)

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (7)

6 Conclusion and Limitations

In this paper, we demonstrated the capability of a general policy leveraging rich tactile sensing to perform in-hand object rotation about any rotation axis in any hand direction. This marks a significant step toward more general tactile dexterity with fully-actuated multi-fingered robot hands.

While dense touch generally gave the best performance, it still had difficulties with objects that were box-shaped or of larger aspect ratios. We attribute this to the problem of some grasping points producing similar tactile information for different states of the system. Using richer tactile representations, such as tactile images or contact force fields, or integrating vision information, could help infer additional properties to enhance robustness. Also, the actuation of the Allegro Hand was significantly weakened under certain hand orientations. Therefore, designing low-cost and more capable hardware is crucial for advancing dexterous manipulation with multi-fingered robotic hands.

The goal to manipulate objects effortlessly in free space using a sense of touch mirrors a key aspect of human dexterity and stands as a significant goal in robot manipulation. We hope that our research underscores the importance of tactile sensing and spurs continued efforts towards this goal.

Acknowledgments

We thank Andrew Stinchcombe for helping with the 3D-printing of thestimuli and tactile sensors. We thank Haozhi Qi for the valuable discussions. This work was supported by the EPSRC Doctoral Training Partnership (DTP) scholarship.

References

Okamura etal. [2000]A.M. Okamura, N.Smaby, and M.R. Cutkosky.An overview of dexterous manipulation.In Proceedings 2000 ICRA. Millennium Conference. IEEE International Conference on Robotics and Automation. Symposia Proceedings (Cat. No. 00CH37065), volume1, pages 255–262. IEEE, 2000.
Akkaya etal. [2019]I.Akkaya, M.Andrychowicz, M.Chociej, M.Litwin, B.McGrew, A.Petron, A.Paino, M.Plappert, G.Powell, R.Ribas, etal.Solving rubik’s cube with a robot hand.arXiv preprint arXiv:1910.07113, 2019.
Andrychowicz etal. [2020]O.M. Andrychowicz, B.Baker, M.Chociej, R.Jozefowicz, B.McGrew, J.Pachocki, A.Petron, M.Plappert, G.Powell, A.Ray, etal.Learning dexterous in-hand manipulation.The International Journal of Robotics Research, 39(1):3–20, 2020.
Khandate etal. [2022]G.Khandate, M.Haas-Heger, and M.Ciocarlie.On the feasibility of learning finger-gaiting in-hand manipulation with intrinsic sensing.In 2022 International Conference on Robotics and Automation (ICRA), pages 2752–2758. IEEE, 2022.
Qi etal. [2023a]H.Qi, A.Kumar, R.Calandra, Y.Ma, and J.Malik.In-hand object rotation via rapid motor adaptation.In Conference on Robot Learning, pages 1722–1732. PMLR, 2023a.
Qi etal. [2023b]H.Qi, B.Yi, S.Suresh, M.Lambeta, Y.Ma, R.Calandra, and J.Malik.General in-hand object rotation with vision and touch.In Conference on Robot Learning, pages 2549–2564. PMLR, 2023b.
Khandate etal. [2023]G.Khandate, S.Shang, E.T. Chang, T.L. Saidi, J.Adams, and M.Ciocarlie.Sampling-based exploration for reinforcement learning of dexterous manipulation.arXiv preprint arXiv:2303.03486, 2023.
Han and Trinkle [1998]L.Han and J.C. Trinkle.Dextrous manipulation by rolling and finger gaiting.In Proceedings. 1998 IEEE International Conference on Robotics and Automation (Cat. No. 98CH36146), volume1, pages 730–735. IEEE, 1998.
Han etal. [1997]L.Han, Y.-S. Guan, Z.Li, Q.Shi, and J.C. Trinkle.Dextrous manipulation with rolling contacts.In Proceedings of International Conference on Robotics and Automation, volume2, pages 992–997. IEEE, 1997.
Bicchi and Sorrentino [1995]A.Bicchi and R.Sorrentino.Dexterous manipulation through rolling.In Proceedings of 1995 IEEE International Conference on Robotics and Automation, volume1, pages 452–457. IEEE, 1995.
Rus [1999]D.Rus.In-hand dexterous manipulation of piecewise-smooth 3-d objects.The International Journal of Robotics Research, 18(4):355–381, 1999.
Fearing [1986]R.Fearing.Implementing a force strategy for object re-orientation.In Proceedings. 1986 IEEE International Conference on Robotics and Automation, volume3, pages 96–102. IEEE, 1986.
Leveroni and Salisbury [1996]S.Leveroni and K.Salisbury.Reorienting objects with a robot hand using grasp gaits.In Robotics Research: The Seventh International Symposium, pages 39–51. Springer, 1996.
Platt etal. [2004]R.Platt, A.H. fa*gg, and R.A. Grupen.Manipulation gaits: Sequences of grasp control tasks.In IEEE International Conference on Robotics and Automation, 2004. Proceedings. ICRA’04. 2004, volume1, pages 801–806. IEEE, 2004.
Saut etal. [2007]J.-P. Saut, A.Sahbani, S.El-Khoury, and V.Perdereau.Dexterous manipulation planning using probabilistic roadmaps in continuous grasp subspaces.In 2007 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 2907–2912. IEEE, 2007.
Bai and Liu [2014]Y.Bai and C.K. Liu.Dexterous manipulation using both palm and fingers.In 2014 IEEE International Conference on Robotics and Automation (ICRA), pages 1560–1565. IEEE, 2014.
Shi etal. [2017]J.Shi, J.Z. Woodruff, P.B. Umbanhowar, and K.M. Lynch.Dynamic in-hand sliding manipulation.IEEE Transactions on Robotics, 33(4):778–795, 2017.
Teeple etal. [2022]C.B. Teeple, B.Aktaş, M.C. Yuen, G.R. Kim, R.D. Howe, and R.J. Wood.Controlling palm-object interactions via friction for enhanced in-hand manipulation.IEEE Robotics and Automation Letters, 7(2):2258–2265, 2022.
Fan etal. [2017]Y.Fan, W.Gao, W.Chen, and M.Tomizuka.Real-time finger gaits planning for dexterous manipulation.IFAC-PapersOnLine, 50(1):12765–12772, 2017.
Sundaralingam and Hermans [2018]B.Sundaralingam and T.Hermans.Geometric in-hand regrasp planning: Alternating optimization of finger gaits and in-grasp manipulation.In 2018 IEEE International Conference on Robotics and Automation (ICRA), pages 231–238. IEEE, 2018.
Morgan etal. [2022]A.S. Morgan, K.Hang, B.Wen, K.Bekris, and A.M. Dollar.Complex in-hand manipulation via compliance-enabled finger gaiting and multi-modal planning.IEEE Robotics and Automation Letters, 7(2):4821–4828, 2022.
Khadivar and Billard [2023]F.Khadivar and A.Billard.Adaptive fingers coordination for robust grasp and in-hand manipulation under disturbances and unknown dynamics.IEEE Transactions on Robotics, 2023.
Gao etal. [2023]X.Gao, K.Yao, F.Khadivar, and A.Billard.Real-time motion planning for in-hand manipulation with a multi-fingered hand.arXiv preprint arXiv:2309.06955, 2023.
Pang etal. [2023]T.Pang, H.T. Suh, L.Yang, and R.Tedrake.Global planning for contact-rich manipulation via local smoothing of quasi-dynamic contact models.IEEE Transactions on Robotics, 2023.
Nagabandi etal. [2020]A.Nagabandi, K.Konolige, S.Levine, and V.Kumar.Deep dynamics models for learning dexterous manipulation.In Conference on Robot Learning, pages 1101–1112. PMLR, 2020.
Huang etal. [2023]B.Huang, Y.Chen, T.Wang, Y.Qin, Y.Yang, N.Atanasov, and X.Wang.Dynamic handover: Throw and catch with bimanual hands.arXiv preprint arXiv:2309.05655, 2023.
Chen etal. [2022a]T.Chen, J.Xu, and P.Agrawal.A system for general in-hand object re-orientation.In Conference on Robot Learning, pages 297–307. PMLR, 2022a.
Chen etal. [2022b]T.Chen, M.Tippur, S.Wu, V.Kumar, E.Adelson, and P.Agrawal.Visual dexterity: In-hand dexterous manipulation from depth.arXiv preprint arXiv:2211.11744, 2022b.
Allshire etal. [2022]A.Allshire, M.MittaI, V.Lodaya, V.Makoviychuk, D.Makoviichuk, F.Widmaier, M.Wüthrich, S.Bauer, A.Handa, and A.Garg.Transferring dexterous manipulation from gpu simulation to a remote real-world trifinger.In 2022 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 11802–11809. IEEE, 2022.
Handa etal. [2023]A.Handa, A.Allshire, V.Makoviychuk, A.Petrenko, R.Singh, J.Liu, D.Makoviichuk, K.VanWyk, A.Zhurkevich, B.Sundaralingam, etal.Dextreme: Transfer of agile in-hand manipulation from simulation to reality.In 2023 IEEE International Conference on Robotics and Automation (ICRA), pages 5977–5984. IEEE, 2023.
Yin etal. [2023]Z.-H. Yin, B.Huang, Y.Qin, Q.Chen, and X.Wang.Rotating without seeing: Towards in-hand dexterity through touch.arXiv preprint arXiv:2303.10880, 2023.
Chen etal. [2023]T.Chen, M.Tippur, S.Wu, V.Kumar, E.Adelson, and P.Agrawal.Visual dexterity: In-hand reorientation of novel and complex object shapes.Science Robotics, 8(84):eadc9244, 2023.
Sievers etal. [2022]L.Sievers, J.Pitz, and B.Bäuml.Learning purely tactile in-hand manipulation with a torque-controlled hand.In 2022 International Conference on Robotics and Automation (ICRA), pages 2745–2751. IEEE, 2022.
Röstel etal. [2023]L.Röstel, J.Pitz, L.Sievers, and B.Bäuml.Estimator-coupled reinforcement learning for robust purely tactile in-hand manipulation.In 2023 IEEE-RAS 22nd International Conference on Humanoid Robots (Humanoids), pages 1–8. IEEE, 2023.
Pitz etal. [2023]J.Pitz, L.Röstel, L.Sievers, and B.Bäuml.Dextrous tactile in-hand manipulation using a modular reinforcement learning architecture.arXiv preprint arXiv:2303.04705, 2023.
Yuan etal. [2017]W.Yuan, S.Dong, and E.H. Adelson.GelSight: High-Resolution Robot Tactile Sensors for Estimating Geometry and Force.Sensors, 17(12):2762, Dec. 2017.doi:10.3390/s17122762.
Ward-Cherrier etal. [2018]B.Ward-Cherrier, N.Pestell, L.Cramphorn, B.Winstone, M.E. Giannaccini, J.Rossiter, and N.F. Lepora.The tactip family: Soft optical tactile sensors with 3d-printed biomimetic morphologies.Soft robotics, 5(2):216–227, 2018.
Lambeta etal. [2020]M.Lambeta, P.-W. Chou, S.Tian, B.Yang, B.Maloon, V.R. Most, D.Stroud, R.Santos, A.Byagowi, G.Kammerer, etal.Digit: A novel design for a low-cost compact high-resolution tactile sensor with application to in-hand manipulation.IEEE Robotics and Automation Letters, 5(3):3838–3845, 2020.
Lepora etal. [2022]N.F. Lepora, Y.Lin, B.Money-Coomes, and J.Lloyd.Digitac: A digit-tactip hybrid tactile sensor for comparing low-cost high-resolution robot touch.IEEE Robotics and Automation Letters, 7(4):9382–9388, 2022.
Daniel F.Gomes and Luo [2021]P.P. Daniel F.Gomes and S.Luo.Generation of gelsight tactile images for sim2real learning.IEEE Robotics and Automation Letters, 6(2):4177–4184, Apr. 2021.
Wang etal. [2022]S.Wang, M.Lambeta, P.-W. Chou, and R.Calandra.Tacto: A fast, flexible, and open-source simulator for high-resolution vision-based tactile sensors.IEEE Robotics and Automation Letters, 7(2):3930–3937, 2022.
Si and Yuan [2022]Z.Si and W.Yuan.Taxim: An example-based simulation model for gelsight tactile sensors.IEEE Robotics and Automation Letters, pages 2361–2368, 2022.
Chen etal. [2023]Z.Chen, S.Zhang, S.Luo, F.Sun, and B.Fang.Tacchi: A pluggable and low computational cost elastomer deformation simulator for optical tactile sensors.IEEE Robotics and Automation Letters, 8(3):1239–1246, 2023.doi:10.1109/LRA.2023.3237042.
Church etal. [2021]A.Church, J.Lloyd, R.Hadsell, and N.Lepora.Tactile Sim-to-Real Policy Transfer via Real-to-Sim Image Translation.In Proceedings of the 5th Conference on Robot Learning, pages 1–9. PMLR, Oct. 2021.
Jianu etal. [2021]T.Jianu, D.F. Gomes, and S.Luo.Reducing tactile sim2real domain gaps via deep texture generation networks.arXiv preprint arXiv:2112.01807, 2021.
Chen etal. [2022]W.Chen, Y.Xu, Z.Chen, P.Zeng, R.Dang, R.Chen, and J.Xu.Bidirectional sim-to-real transfer for gelsight tactile sensors with cyclegan.IEEE Robotics and Automation Letters, 7(3):6187–6194, 2022.
Xu etal. [2023]J.Xu, S.Kim, T.Chen, A.R. Garcia, P.Agrawal, W.Matusik, and S.Sueda.Efficient tactile simulation with differentiability for robotic manipulation.In Proceedings of The 6th Conference on Robot Learning, volume 205 of Proceedings of Machine Learning Research, pages 1488–1498. PMLR, 14–18 Dec 2023.URL https://proceedings.mlr.press/v205/xu23b.html.
Luu etal. [2023]Q.K. Luu, N.H. Nguyen, etal.Simulation, learning, and application of vision-based tactile sensing at large scale.IEEE Transactions on Robotics, 2023.
Lin etal. [2022]Y.Lin, J.Lloyd, A.Church, and N.Lepora.Tactile gym 2.0: Sim-to-real deep reinforcement learning for comparing low-cost high-resolution robot touch.volume7 of Proceedings of Machine Learning Research, pages 10754–10761. IEEE, August 2022.doi:10.1109/LRA.2022.3195195.URL https://ieeexplore.ieee.org/abstract/document/9847020.
Lin etal. [2023]Y.Lin, A.Church, M.Yang, H.Li, J.Lloyd, D.Zhang, and N.F. Lepora.Bi-touch: Bimanual tactile manipulation with sim-to-real deep reinforcement learning.IEEE Robotics and Automation Letters, 2023.
Yang etal. [2023]M.Yang, Y.Lin, A.Church, J.Lloyd, D.Zhang, D.A. Barton, and N.F. Lepora.Sim-to-real model-based and model-free deep reinforcement learning for tactile pushing.IEEE Robotics and Automation Letters, 2023.
Schulman etal. [2017]J.Schulman, F.Wolski, P.Dhariwal, A.Radford, and O.Klimov.Proximal policy optimization algorithms.arXiv preprint arXiv:1707.06347, 2017.
Makoviychuk etal. [2021]V.Makoviychuk, L.Wawrzyniak, Y.Guo, M.Lu, K.Storey, M.Macklin, D.Hoeller, N.Rudin, A.Allshire, A.Handa, etal.Isaac gym: High performance gpu-based physics simulation for robot learning.arXiv preprint arXiv:2108.10470, 2021.
Brahmbhatt etal. [2019]S.Brahmbhatt, C.Ham, C.C. Kemp, and J.Hays.Contactdb: Analyzing and predicting grasp contact via thermal imaging.In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8709–8719, 2019.
Hansen etal. [2019]N.Hansen, Y.Akimoto, and P.Baudis.CMA-ES/pycma on Github.Zenodo, DOI:10.5281/zenodo.2559634, Feb. 2019.URL https://doi.org/10.5281/zenodo.2559634.
Makoviichuk and Makoviychuk [2021]D.Makoviichuk and V.Makoviychuk.rl-games: A high-performance framework for reinforcement learning.https://github.com/Denys88/rl_games, May 2021.
Kumar etal. [2021]A.Kumar, Z.Fu, D.Pathak, and J.Malik.Rma: Rapid motor adaptation for legged robots.arXiv preprint arXiv:2107.04034, 2021.
Lepora [2021]N.F. Lepora.Soft Biomimetic Optical Tactile Sensing With the TacTip: A Review.IEEE Sensors Journal, 21(19):21131–21143, Oct. 2021.ISSN 1530-437X, 1558-1748, 2379-9153.doi:10.1109/JSEN.2021.3100645.
Bradski [2000]G.Bradski.The OpenCV Library.Dr. Dobb’s Journal of Software Tools, 2000.

Appendix A Observations and Privileged Information

The full list of real-world observations $o_{t}$ and privileged information $x_{t}$ used for the agent is presented in Tables2 and 3 respectively. This privileged information is used to train the teacher with RL and for obtaining the target latent vector $\bar{z}$ during student training. It is not used during deployment. The proprioception and tactile dimensions are in multiples of four, representing four fingers.

Proprioception
Name	Symbol	Dimensions
Joint Position	$q$	16
Fingertip Position	$f^{p}$	12
Fingertip Orientation	$f^{o}$	16
Previous Action	$a_{t-1}$	16
Target Joint Positions	$\bar{q}$	16
Tactile
Binary Contact	$c$	4
Contact Pose	$P$	8
Contact Force Magnitude	$F$	4
Task
Target Rotation Axis	$\hat{k}$	3

Object Information
Name	Symbol	Dimensions
Position	$p_{o}$	3
Orientation	$r_{o}$	4
Angular Velocity	${\omega_{r}}$	3
Dimensions	$\rm dim_{o}$	2
Center of Mass	$\rm COM_{o}$	3
Mass	$m_{o}$	1
Gravity Vector	$\hat{g}$	3
Auxiliary Goal Information
Position	$p_{g}$	3
Orientation	$r_{g}$	4

Appendix B Reward Function

B.1 Base Reward

In the following, we explicitly define each term of the reward function used for learning multi-axis object rotation. The full reward function is:

r=r_{\rm rotation}+r_{\rm contact}+r_{\rm stable}+r_{\rm terminate},\\

(2)

where,

\displaystyle\begin{split}&r_{\rm rotation}=\lambda_{\rm kp}r_{\rm kp}+\lambda%_{\rm rot}r_{\rm rot}+\lambda_{\rm goal}r_{\rm goal},\\&r_{\rm contact}=\lambda_{\text{rew}}(\lambda_{\rm gc}r_{\rm gc}+\lambda_{\rmbc%}r_{\rm bc}),\\&r_{\rm stable}=\lambda_{\text{rew}}(\lambda_{\omega}r_{\omega}+\lambda_{\rmpose%}r_{\rm pose}+\lambda_{\rm work}r_{\rm work}+\lambda_{\rm torque}r_{\rm torque%}),\\&r_{\rm terminate}=\lambda_{\rm penalty}r_{\rm penalty}\ \end{split}

Keypoint Distance Reward:

r_{\rm kp}=\frac{d_{\rm kp}}{(e^{ax}+b+e^{-ax})}

(3)

where the keypoint distance $kp_{\rm dist}=\frac{1}{N}\sum^{N}_{i=1}||k^{\rm o}_{i}-k^{\rm g}_{i}||$ , $k^{o}$ and $k^{g}$ are keypoint positions of the object and goal respectively. We use $N=6$ keypoints placed 5 cm from the object origin in each of its principle axes, and the parameters $a=50$ , $b=2.0$ .

Rotation Reward:

r_{\rm rot}=\text{clip}(\Delta\Theta\cdot\hat{k};-c_{1},c_{1})

(4)

The rotation reward represents the change in object rotation about the target rotation axis. We clip this reward in the limit $c_{1}=0.025$ rad.

Goal Bonus Reward:

r_{\rm goal}=\begin{cases}1\quad{\rm if}\ kp_{\rm dist}<d_{\rm tol}\\0\quad\text{otherwise}\\\end{cases}

(5)

where we use a keypoint distance tolerance $d_{\rm tol}$ to determine when a goal has been reached.

Good Contact Reward:

r_{\rm gc}=\begin{cases}1\quad\text{if $n_{\text{tip\_contact}}\geq 2$}\\0\quad\text{otherwise}\\\end{cases}

(6)

where $n_{\text{tip\_contact}}=\text{sum}(c)$ . This rewards the agent if the number of tip contacts is greater or equal to 2 to encourage stable grasping contacts.

Bad Contact Penalty:

r_{\rm bc}=\begin{cases}1\quad\text{if $n_{\text{non\_tip\_contact}}\geq 0$}\\0\quad\text{otherwise}\\\end{cases}

(7)

where $n_{\text{non\_tip\_contact}}$ is defined as the sum of all contacts with the object that is not a fingertip. We accumulate all the contacts in the simulation to calculate this.

Angular Velocity Penality:

r_{\omega}=-\min(||{\omega_{o}}||-\omega_{\max},0)

(8)

where the maximum angular velocity $\omega_{\max}=0.5$ . This term penalises the agent if the angular velocity of the object exceeds the maximum.

Pose Penalty:

r_{\rm pose}=-||q-q_{0}||

(9)

where $q_{0}$ is the joint positions for some canonical grasping pose.

Work Penalty:

r_{\rm work}=-\tau^{T}\bar{q}

(10)

Torque Penalty:

r_{\rm work}=-||\tau||

(11)

where in the above $\tau$ is the torque applied to the joints during an actioned step.

Termination Penalty:

r_{\rm terminate}=\begin{cases}-1\quad(k\,p_{\rm dist}>d_{\max})\ {\rm or}\ ({%\hat{k}_{o}>\hat{k}_{\max}})\\0\quad{\rm\ otherwise}\\\end{cases}

(12)

Here we define two conditions to signify the termination of an episode. The first condition represents the object falling out of grasp, for which we use the maximum keypoint distance of $d_{\max}=0.1$ . The second condition represents the deviation of the object rotation axis from the target rotation axis ( $\hat{k}_{o}$ ) beyond a maximum $\hat{k}_{\max}$ . We use $\hat{k}_{\max}=45^{\circ}$ .

The corresponding weights for each reward term is: $\lambda_{\rm kp}=1.0$ , $\lambda_{\rm rot}=5.0$ , $\lambda_{\rm goal}=10.0$ , $\lambda_{\rm gc}=0.1$ , $\lambda_{\rm bc}=0.2$ , $\lambda_{\omega}=0.75$ , $\lambda_{\rm pose}=0.2$ , $\lambda_{\rm work}=2.0$ , $\lambda_{\rm torque}=1.0$ , $\lambda_{\rm penalty}=50.0$ .

B.2 Alternative Reward

We also formulate an alternative reward function consisting of an angular velocity reward and rotation axis penalty to compare with our auxiliary goal formulation.

Angular Velocity Reward:

r_{\rm av}=\text{clip}(\omega\cdot\hat{k},-c_{2},c_{2})\

(13)

where $c_{2}=0.5$ .

Rotation Axis Penalty:

r_{\rm axis}=1-\frac{\hat{k}\cdot\hat{k}_{o}}{||\hat{k}||||\hat{k}_{o}||}\

(14)

where $||\hat{k}_{o}||$ is the current object rotation axis.

We form the new $r_{\rm rotation}$ reward $r_{\rm rotation}=\lambda_{\rm av}r_{\rm av}+\lambda_{\rm rot}r_{\rm rot}$ . We provide an additional object axis penalty $\lambda_{\rm axis}r_{\rm axis}$ in the $r_{\rm stable}$ term and remove the angular velocity penalty, $\lambda_{\omega}=0$ . The weights are $\lambda_{\rm av}=1.5$ and $\lambda_{\rm axis}=1.0$ . We keep all other terms of the reward function the same.

B.3 Adaptive Reward Curriculum

The adaptive reward curriculum is implemented using a linear schedule of the reward curriculum coefficient $\lambda_{\rm rew}(r_{\rm contact}+r_{\rm stable})$ which increases with successive goals are reached per episode,

\lambda_{\rm rew}=\frac{g_{\rm eval}-g_{min}}{g_{max}-g_{min}}

(15)

where $[g_{min},g_{max}]$ determines the ranges where the reward curriculum is active. This shifts the learning objective towards more realistic finger-gaiting motions as the contact and stability reward increases. We use $[g_{min},g_{max}]=[1.0,2.0]$ .

Appendix C Grasp Generation

To generate stable grasps, we initiate the object at 13cm above the base of the hand at random orientations and initialize the hand at a canonical grasp pose at the palm-up hand orientation. We then sample relative offset to the joint positions $\mathcal{U}(-0.3,0.3)$ rad. We run the simulation by 120 steps (6 seconds) while sequentially changing the gravity direction from 6 principle axes of the hand ( $\pm xyz\text{-axes}$ ). We save the object orientation and joint positions (10000 grasp poses per object) if the following conditions are satisfied:
- The number of tip contacts is greater than 2.
- The number of non-tip contacts is zero
- Total fingertip to object distance is less than 0.2
- Object remains stable for the duration of the episode.

Appendix D Domain Randomization

In addition to the initial grasing pose, target rotation axis and hand orientation, we also include additional domain randomization during teacher and student training to improve sim-to-real performance (shown in Table4).

Object		Hand
Capsule Radius (m)	[0.025, 0.034]	PD Controller: Stiffness	$\times\mathcal{U}(0.9,1.1)$
Capsule Width (m)	[0.000, 0.012]	PD Controller: Damping	$\times\mathcal{U}(0.9,1.1)$
Box Width (m)	[0.045, 0.06]	Observation: Joint Noise	0.03
Box Height (m)	[0.045, 0.06]	Observation: Fingertip Position Noise	0.005
Mass (kg)	[0.025, 0.20]	Observation: Fingertip Orientation Noise	0.01
Object: Friction	10.0
Hand: Friction	10.0	Tactile
Center of Mass (m)	[-0.01, 0.01]	Observation: Pose Noise	0.0174
Disturbance: Scale	2.0	Observation: Force Noise	0.1
Disturbance: Probability	0.25
Disturbance: Decay	0.99

Appendix E System Identification

To reduce the sim-to-real gap of the allegro hand, we perform system identification to match the simulated robot hand with the real hand. We model each of the 16 DoF of the hand with the parameters; stiffness, damping, mass, friction, and armature, resulting in a total of 80 parameters to optimize. We collect corresponding trajectories in simulation and the real world in various hand orientations and use CMA-ES [55] to minimize the mean-squared error of the trajectories to find the best matching simulation parameters.

Appendix F Simulated Tactile Processing

To simulate our soft tactile sensor in a rigid body simulator, we process the received contact information from the simulator to make up the tactile observations. We use contact force information to compute binary contact signals:

c=\{1{\rm\ if\ }||\mathbf{F}||>0.25\,N;0{\rm\ otherwise}\}

(16)

A contact force threshold of 0.25 N was selected to simulate the binary contact detection of the real sensor. For contact force information, we simulate sensing delay caused by elastic deformation of the soft tip in the real world by applying an exponential average on the received force readings.

\textbf{F}=\alpha F_{t}+(1-\alpha)F_{t-1}

(17)

where we use $\alpha=0.5$ . We then apply a saturation limit and re-scaling to align simulated contact force sensing ranges with the ranges experienced in the real world.

\textbf{F}=\beta_{F}{\rm clip}(F,\ F_{\min},\ F_{\max})

(18)

where we use $\beta_{F}=0.6$ , $F_{\min}=0.0$ N, $F_{\max}=5.0$ N. We also apply the same saturation and rescaling factor for the contact pose.

\textbf{P}=\beta_{P}{\rm clip}(P,\ P_{\min},\ P_{\max})

(19)

where we use $\beta_{P}=0.6$ , $P_{\min}=-0.53$ rad, $P_{\max}=0.53$ rad. We use binary contact signals to mask contact pose and contact force observations to minimize noise in the tactile feedback. The same masking technique was applied in the real world.

Appendix G Architecture and Policy Training

The network architecture and training hyperparameters are shown in Table 5. The proprioception policy uses an observation input dimension of $N=79$ , the binary touch $N=83$ , and the full touch $N=95$ . We use a history of 30 time steps as input to the temporal convolutional network (TCN) and encode the privileged information into a latent vector of size $n=8$ for all the policies.

Teacher		Student
MLP Input Dim	18	TCN Input Dim	[30, N]
MLP Hidden Units	[256, 128, 8]	TCN Hidden Units	[N, N]
MLP Activation	ReLU	TCN Filters	[N, N, N]
Policy Hidden Units	[512, 256, 128]	TCN Kernel	[9, 5, 5]
Policy Activation	ELU	TCN Stride	[2, 1, 1]
Learning Rate	$5\times 10^{-3}$	TCN Activation	ReLU
Num Envs	8192	Latent Vector Dim $z$	8
Rollout Steps	8	Policy Hidden Units	[512, 256, 128]
Minibatch Size	32768	Policy Activation	ELU
Num Mini Epochs	5	Learning Rate	$3\times 10^{-4}$
Discount	0.99	Num Envs	8192
GAE $\tau$	0.95	Batch Size	8192
Advantage Clip $\epsilon$	0.2	Num Mini Epochs	1
KL Threshold	0.02	Optimizer	Adam
Gradient Norm	1.0	Goal Update $d_{\rm tol}$	0.25
Optimizer	Adam
Goal Update $d_{\rm tol}$	0.15

Appendix H Tactile Observation Model

Data Collection. The setup for tactile feature extraction is shown in Figure 10. We collect data by tapping and shearing the sensor on a flat stimulus fixed onto a force torque sensor and collect six labels for training: contact depth $z$ , contact pose in $R_{x}$ , contact pose in $R_{y}$ , and contact forces $F_{x}$ , $F_{y}$ and $F_{z}$ . In order to capture sufficient contact features needed for the in-hand object rotation task, we sample the sensor poses with the ranges shown in Table 6, which provides sensing ranges for contact pose between $[-28^{\circ},28^{\circ}]$ and contact force of up to 5 N.

Training. The architecture and training parameters of the observation model are shown in Table7. For each fingertip sensor, we collect 3000 images (2400 train and 600 test) and train separate models. The prediction error for one of the sensors is shown in Figure11.

Pose Component	Sampled range
Depth $z$ (mm)	[-1, -4]
Shear $S_{x}$ (mm)	[-2, -2]
Shear $S_{y}$ (mm)	[-2, -2]
Rotation $R_{x}$ (deg)	[-28, 28]
Rotation $R_{y}$ (deg)	[-28, 28]

Observation Model
Conv Input Dim	[240, 135]
Conv Filters	[32, 32, 32, 32]
Conv Kernel	[11, 9, 7, 5]
Conv Stride	[1, 1, 1, 1]
Max Pooling Kernal	[2, 2, 2, 2]
Max Pooling Stride	[2, 2, 2, 2]
Output Dim	6
Batch Normalization	True
Activation	ReLU
Learning Rate	$1\times 10^{-4}$
Batch Size	16
Num Epochs	100
Optimizer	Adam

Appendix I Tactile Image Processing

The tactile sensors provide raw RGB images from the camera module. We use an exposure setting of 312.5 and a resolution of $640\times 480$ , providing a frame rate of up to 30 FPS. The images are then postprocessed to compute tactile observations. For all tactile observations, we convert the raw image to greyscale and resale the dimension to $240\times 135$ .
Binary Contact: We further apply a medium blur filter with an aperture linear size of 11, followed by an adaptive threshold with a block size of 55 pixels and a constant offset value of -2. These operations improve the smoothness of the image and filter out unwanted noise. The postprocessed image is compared with a reference image using the Structural Similarity Index (SSIM) to compute binary contact (0 or 1). We use an SSIM threshold of 0.6 for contact detection.
Contact Pose and Force: We directly use the resized greyscale image for contact force and pose prediction. From the target labels, we use contact pose ( $R_{x}$ , $R_{y}$ ) and the contact force components $F_{x}$ , $F_{y}$ , $F_{z}$ (to compute the contact force magnitude $||F||$ ) to construct the dense tactile representation used during policy training. We use the binary contact signal to mask contact pose and force, thresholding the predictions at $\approx 0.25N$ .

Appendix J Real-world Deployment

Tactile Sensor Design. This design of the sensor is based on the DigiTac version[39] of the TacTip[37, 58], a soft optical tactile sensor that provides contact information through marker-tipped pin motion under its sensing surface. Here, we have redesigned the DIGIT base to be more compact with a new PCB board, modular camera and lighting system (Figure13). We also improved the morphology of the skin and base connector to provide a larger and smoother sensing surface for greater fingertip dexterity. The tactile sensor skin and base are entirely 3D printed with Agilus 30 for skin and vero-series for the markers on the pin-tips and for the casings. Each base contains a camera driver board that connects to the computer via a USB cable and can be streamed asynchronously at a frame rate of 30 FPS. We perform post-processing using OpenCV[59] in real-time.

Sensor Placement. As the tactile fingertips are primarily sensorized over a front-facing area, we experimented with different orientations relative to the fingers and tried to maximize contact sensing during in-hand object rotation. We placed the tactile fingertips with offsets (thumb, index, middle, ring) = ( $-45^{\circ}$ , $-45^{\circ}$ , $0^{\circ}$ , $45^{\circ}$ ).

Control Pipeline. Each tactile observation model is deployed together with the policy as shown in Figure 13. We stream tactile and proprioception readings asynchronously at 20 Hz. The joint positions are used by a forward kinematic solver to compute fingertip position and orientation. The relative joint positions obtained from the policy are converted to target joint commands. This is published to the Allegro Hand and converted to torque commands by a PD controller at 300 Hz.

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (10)

AnyRotate: Gravity-Invariant In-Hand Object Rotation with Sim-to-Real Touch (11)

Object Properties. Various physical properties of the objects used in the real-world experiment are shown in Table 8. We include objects of different sizes and shapes not seen during training.

Real-world Object Set
	Dimensions (mm)	Mass (g)		Dimensions (mm)	Mass (g)
Plastic Apple	$75\times 75\times 70$	60	Tin Cylinder	$45\times 45\times 63$	30
Plastic Orange	$70\times 72\times 72$	52	Cube	$51\times 51\times 51$	65
Plastic Pepper	$61\times 68\times 65$	10	Gum Box	$90\times 80\times 76$	89
Plastic Peach	$62\times 56\times 55$	30	Container	$90\times 80\times 76$	32
Plastic Lemon	$52\times 52\times 65$	33	Rubber Toy	$80\times 53\times 48$	27

Appendix K Additional Experiments

K.1 Hyperparamters

We provide additional ablation studies to analyze the design choices for our axillary goal formulation. The effect of goal update tolerance $d_{\rm tol}$ for the student training and the goal increment intervals are shown in Table9.

The performance can be significantly affected by the goal-update tolerance. As the tolerance reduced during student training, the number of average rotations and successive goals reached per episode also reduced. This suggests that the performance of the teacher policy was poorly transferred and the student could not learn the multi-axis object rotation skill effectively. Increasing the goal increment intervals also resulted in fewer rotations achieved.

Goal Update Tolerance	Rot	TTT(s)	#Success
$d_{\rm tol}=0.15$	0.75	28.1	3.07
$d_{\rm tol}=0.20$	1.36	27.7	4.48
$\boldsymbol{d_{\rm tol}=0.25}$	1.77	27.2	5.26
Goal Increment Interval	Rot	TTT(s)	#Success
$\theta=50^{\circ}$	1.30	27.1	3.86
$\theta=40^{\circ}$	1.50	26.7	4.36
$\boldsymbol{\theta=30^{\circ}}$	1.77	27.2	5.26

K.2 Rotating Hand

We test the robustness of the policy by performing in-hand object rotation during different hand movements. In particular, we choose hand trajectories where the gravity vector is continuously changing relative to the orientation of the hand, adding greater complexity to this task. Rollouts for three different hand trajectories are shown in Figure 14. In particular, for the third hand trajectory (iii), we demonstrate the capability of the robot hand to servo around the surface of the object in different directions while keeping the object almost stationary in free space. This motion also demonstrates the ability to command different target rotation axes during deployment, offering a useful set of primitives for other downstream tasks.