K-best sphere decoding is one of the most popular MIMO (Multi-Input Multi-Output) detection algorithms because of its low complexity and close to Maximum Likelihood (ML) Bit Error Rate (BER) performance. Unfortunately, conventional multi-stage sphere decoders suffer from the inability to adapt to varying antenna configurations, requiring implementation redesign for each specific array structure. In this paper, we propose a reconfigurable in-place architecture that is scalable to an arbitrary number of antennas at run-time, while reducing area significantly compared with other sphere decoders. To improve the throughput of the in-place architecture without any degradation in BER performance, we propose partial-sort-bypass and symbol interleaving techniques, and also exploit multi-core design. Implementation results for a 16-QAM MIMO decoder in a 130 nm CMOS technology show a 41% reduction in area compared to the smallest sphere decoder while maintaining antenna reconfigurability, and better throughput. When implemented for the 802.11n standard, our architecture results in 42% reduction in area compared to the multi-stage architecture.