No studies have compared physical vs. combined imagery/physical practice in music training. Motion capture measured hand and stick movements of 28 percussionists assigned to a physical (P) or combined imagery/physical (IP) group. Both groups practiced an excerpt on an electronic drum to a metronome. During acquisition, the P group performed the excerpt on all 40 trials. The IP group physically performed (20 trials) and imagined performing the except (20 trials). Pre-test and post-test trials were obtained before and after acquisition. Participants also completed a survey measuring affect and motivation to continue training. Temporal errors were computed by subtracting midi note onset data from tempo-defined note onsets for pre- and post-test trials. Temporal errors and kinematics only differed in relation to time and not group. Temporal errors improved from pre- (51.4 ms) to post-test (37.9 ms). Both sticks initiated strokes from higher average positions in post- (left: 12.3 cm; right: 13.4 cm) vs. pre-test (left: 11 cm; right: 6.2 cm). Hand velocity was greater in post- (left: 16.7 cm/s; right: 15.8 cm/s) vs. pre-test (left: 13.9 cm/s; right: 13.8 cm/s). The IP group (3.8) reported less perceived effort during training vs. the P group (5.3) and greater interest in continuing imagery/physical training over the short-term (5) and indefinitely (5.2) vs. the P group continuing physical training (short term = 3.6; indefinitely = 3.5). Combined IP practice yields comparable rhythmic accuracy to P practice while potentially enhancing training adherence via reduced perceived effort.