How can fast inference from transformers be achieved via speculative decoding, and what are the key techniques or algorithms involved in this process? Additionally, how does speculative decoding improve the efficiency and speed of transformer models in practical applications, and what are the potential trade-offs or limitations associated with this approach compared to traditional decoding methods?