In the context of machine learning, what is a key advantage of using sparse transformers for generating long sequences?
A) They require more training data compared to dense transformers.
B) They reduce computational complexity by focusing on a subset of token interactions.
C) They increase memory usage to handle larger model sizes.
D) They eliminate the need for attention mechanisms in sequence generation.