github evalplus/evalplus v0.1.0
EvalPlus v0.1.0 and Pre-Generated LLM Code Samples for HumanEval+

latest releases: v0.3.1, v0.3.0, v0.2.2...
18 months ago

What is this?

In addition to the initial version of EvalPlus source-code, we release the pre-generated code of LLMs on HumanEval+ (also applicable for base HumanEval) and regularized ground-truth solutions. With these we hope to accelerate future research where research may try to reuse our pre-generated code instead of generating them from scratch.

  • ${MODEL_NAME}_temp_${TEMPERATURE}.zip: LLM-produced program samples
  • HumanEvalPlusGT.zip: The re-implemented ground-truth solutions

Data sources

The configuration of the pre-generated code follows our pre-print paper: https://arxiv.org/abs/2305.01210

  • We evaluated it over:
    • x 14 models (10 model types)
    • x 5 temperature settings including zero temperature (for greedy decoding) as well as {0.2, 0.4, 0.6, 0.8}
    • x 200 code samples used random sampling (i.e., non-greedy decoding settings)
  • We use nucleus sampling with top p = 0.95 for all hugging-face based model
  • Codegen6B and Codegen16B is accelerated by FauxPilot (thanks!)

image

Evaluated results

We draw the results from the samples and test-cases from the base HumanEval and our enhanced HumanEval+:

image

Call for contribution

We also encourage open-source developers to contributing to LLM4Code research by: (i) reproducing and validating our results; (ii) uploading LLM-generated samples and reproducing the results of new models; and of course (iii) trying out our enhanced dataset to get more accurate and trustworthy results!

Don't miss a new evalplus release

NewReleases is sending notifications on new releases.