evalplus/evalplus v0.1.0 on GitHub

What is this?

In addition to the initial version of EvalPlus source-code, we release the pre-generated code of LLMs on HumanEval+ (also applicable for base HumanEval) and regularized ground-truth solutions. With these we hope to accelerate future research where research may try to reuse our pre-generated code instead of generating them from scratch.

${MODEL_NAME}_temp_${TEMPERATURE}.zip: LLM-produced program samples
HumanEvalPlusGT.zip: The re-implemented ground-truth solutions

Data sources

The configuration of the pre-generated code follows our pre-print paper: https://arxiv.org/abs/2305.01210

We evaluated it over:
- x 14 models (10 model types)
- x 5 temperature settings including zero temperature (for greedy decoding) as well as {0.2, 0.4, 0.6, 0.8}
- x 200 code samples used random sampling (i.e., non-greedy decoding settings)
We use nucleus sampling with top p = 0.95 for all hugging-face based model
Codegen6B and Codegen16B is accelerated by FauxPilot (thanks!)

Evaluated results

We draw the results from the samples and test-cases from the base HumanEval and our enhanced HumanEval+:

Call for contribution

We also encourage open-source developers to contributing to LLM4Code research by: (i) reproducing and validating our results; (ii) uploading LLM-generated samples and reproducing the results of new models; and of course (iii) trying out our enhanced dataset to get more accurate and trustworthy results!

evalplus/evalplus v0.1.0 EvalPlus v0.1.0 and Pre-Generated LLM Code Samples for HumanEval+ on GitHub

What is this?

Data sources

Evaluated results

Call for contribution

evalplus/evalplus v0.1.0
EvalPlus v0.1.0 and Pre-Generated LLM Code Samples for HumanEval+

on GitHub