Modern digital communication systems usually employ convolutional codes with large constraint length for good decoding performance, which leads to large complexity and power consumption in Viterbi decoders. It is essential to use T-algorithm in Viterbi decoders to prune significant portions of the trellis states to dramatically reduce power consumption. However, the operation of searching for the best path metrics in the add-compare-select loop in T-algorithm significantly limits the clock speed. In this paper, we propose an efficient architecture based on pre-computation for Viterbi decoders incorporating T-algorithm. Through optimization at both algorithm level and architecture level, the new architecture greatly shortens the long critical path introduced by the conventional T-algorithm. The design example provided in this work demonstrates more than twice improvement in clock speed with negligible computation overhead while maintaining decoding performance.