There are several ways to evaluate the inferential abilities of large language models. Speed is a key differentiator between models, and models with better inferencing capabilities will result in faster response times.
Cost is another area worth evaluating, especially if you're paying to run the model, and there are tools and techniques that can help reduce the cost of running expensive AI models. Arm's (ARM -1.24%) CPU architecture, used in almost every smartphone, is being adopted for AI because of its efficiency.
Accuracy is also a valuable way of judging inference capabilities, and inference and training both determine the accuracy of a model. Accuracy can be easily tested by posing the same questions to different models and seeing which ones give the best answers and make the fewest mistakes.