MIMO techniques use multiple antennas at both the transmitter and receiver sides to achieve diversity gain, multiplexing gain, or both. One of the key challenges in exploiting the potential of MIMO systems is to design high-throughput, low-complexity detection algorithms while achieving near-optimal performance. In this thesis, we design and optimize algorithms for MIMO detection and investigate the associated performance and FPGA implementation aspects.First, we study and optimize a detection algorithm developed by Shabany and Gulak for a K-Best based high throughput and low energy hard output MIMO detection and expand it to the complex domain. The new method uses simple lookup tables, and it is fully scalable for a wide range of K-values and constellation sizes. This technique reduces the computational complexity, without sacrificing performance and the complexity scales only sub-linearly with the constellation size. Second, we apply the bidirectional technique to trellis search and propose a high performance soft output bidirectional path preserving trellis search (PPTS) detector for MIMO systems. The comparative error analysis between single direction and bidirectional PPTS detectors is given. We demonstrate that the bidirectional PPTS detector can minimize the detection error. Next, we design a novel bidirectional processing algorithm for soft-output MIMO systems. It combines features from several types of fixed complexity tree search procedures. The proposed approach achieves a higher performance than previously proposed algorithms and has a comparable computational cost. Moreover, its parallel nature and fixed throughput characteristics make it attractive for very large scale integration (VLSI) implementation.Following that, we present a novel low-complexity hard output MIMO detection algorithm for LTE and WiFi applications. We provide a well-defined tradeoff between computational complexity and performance. The proposed algorithm uses a much smaller number of Euclidean distance (ED) calculations while attaining only a 0.5dB loss compared to maximum likelihood detection (MLD). A 3x3 MIMO system with a 16QAM detector architecture is designed, and the latency and hardware costs are estimated.Finally, we present a stochastic computing implementation of trigonometric and hyperbolic functions which can be used for QR decomposition and other wireless communications and signal processing applications.