According to the paper AttackEval: How to Evaluate the Effectiveness of Jailbreak Attacking on Large Language Models, what is the purpose of developing a comprehensive ground truth dataset for jailbreak tasks?
a. To test the computational efficiency of LLMs
b. To serve as a benchmark for evaluating jailbreak attack prompts
c. To train new models for future research
d. To promote the commercial use of LLMs