We benchmarked 40 LLMs on a 40 item travel medicine quiz. Bayesian modelling was used to evaluate accuracy, consistency, parsability, and cost metrics. Accuracy spanned 27.9-97.5%; reasoning tuned frontier models (OpenAI o3, Perplexity Sonar Reasoning) topped the benchmark, whereas local small underperformed. Cost accuracy curves revealed five Pareto optimal systems, with o3 being the current best. These findings confirm the performance of current LLMs as public health knowledge support systems.