prompt format in code completion

#1
by LeiLeier - opened

Could you please provide the specific details of the prompt format used by this model in code completion scenarios?

Kwaipilot org

Code Completion

text = "#write a quick sort algorithm"

Code Insertion

text = "<|fim▁begin|>Enc(prefix)<|fim▁hole|>Enc(suffix)<|fim▁end|>"

Code Insertion
text = "<|fim▁begin|>Enc(prefix)<|fim▁hole|>Enc(suffix)<|fim▁end|>"

————————————————————————
My validation of the model using the above format to create a local dataset does not work well. Could you please tell me if there are more detailed influencing factors, such as the length limits of prefix and suffix, etc.

The model’s input is limited to within 4k, while there are no strict limitations on the prefix and suffix.

Kwaipilot org

Can you share your inference script?

text="<|fim▁begin|>\n MchSettleBehaviorHandler mchHandler = MchStrategyHandlerFactory.getStrategyHandlerType(mchStrategyConfig.getStrategyType());\n for (RechargeRecord rechargeRecord : rechargeRecordList) {\n try {\n // 异步削峰发起\n subAccountDriveThreadPool.submit((<|fim▁hole|>))\n mchHandler.subAccount(rechargeRecord, mchStrategyConfig);\n } catch (Exception e) {\n logger.error("商户分账异常 rechargeRecordNo:{}", rechargeRecord.getRecordNo(), e);\n }\n }\n<|fim▁end|>"

model_result=""

expected_result=") -> mchHandler.subAccount(rechargeRecord, mchStrategyConfig"

Kwaipilot org
edited 4 days ago

Looks like there is an syntax error in you query, a semicolon is missing at the end of this line:
// 异步削峰发起\n subAccountDriveThreadPool.submit((<|fim▁hole|>));\n
if fixed, the model output will match your expected_result.

**test in transformers

I used vllm to reproduce this result, but I also got the correct answer. Below are my script and the model’s input/output.

from transformers import AutoModelForCausalLM, AutoTokenizer
import torch
from vllm import LLM, SamplingParams

# init model
llm = LLM(model="Kwaipilot/KwaiCoder-23B-A4B-v1")
sampling_params = SamplingParams(temperature=0.0, max_tokens=50)

print("welcome")

while True:
    prompt = input("Query:")
    if prompt.strip() == "":
        print("exit")
        break

    outputs = llm.generate([prompt], sampling_params, )
    
    for output in outputs:
        generated_text = output.outputs[0].text
        print(f"prompt: {prompt!r}\nresult: {generated_text!r}\n")

Here is the result:

图片.png

Input:
<|fim▁begin|>\n MchSettleBehaviorHandler mchHandler = MchStrategyHandlerFactory.getStrategyHandlerType(mchStrategyConfig.getStrategyType());\n for (RechargeRecord rechargeRecord : rechargeRecordList) {\n try {\n // 异步削峰发起\n subAccountDriveThreadPool.submit((<|fim▁hole|>));\n mchHandler.subAccount(rechargeRecord, mchStrategyConfig);\n } catch (Exception e) {\n logger.error("商户分账异常 rechargeRecordNo:{}", rechargeRecord.getRecordNo(), e);\n }\n }\n<|fim▁end|>

output:
) -> mchHandler.subAccount(rechargeRecord, mchStrategyConfig

Can I add some more context information in text?Below is a example.

text = "// context start\n@Data\n@NoArgsConstructor\n@AllArgsConstructor\n@Builder\npublic class CustomerInfoRespDTO{\n CustomerInfoRespDTO ()\n CustomerInfoRespDTO ()\n boolean equals ()\n boolean canEqual ()\n int hashCode ()\n CustomerInfoRespDTOBuilder builder ()\ngetter/setter: getPartnerId, getOutTradeNo, setPartnerId, setOutTradeNo\n}\n// context end\n\n<|fim▁begin|>\nimport javax.annotation.Resource;\n@Slf4j\n@Service\npublic class WebankTradeImpl implements PurchasePaymentService {\n\n @Resource \n private WebankClient webankClient;\n\n @Resource \n private WebankBuilder webankBuilder;\n\n @Override \n public PurchaseChannelEnums getPurchaseChannel() {\n return PurchaseChannelEnums.WEBANK;\n }\n\n /**\n * 同步客户信息\n * @param reqDTO\n * @return \n * @throws Exception\n */\n @Override \n public CustomerInfoRespDTO syncCustomerInfo(CustomerInfoReqDTO reqDTO) throws Exception {\n l<|fim▁hole|>\n String response = webankClient.sendWebank(webankBuilder.buildWebankCustomerUploadReqDTO(reqDTO), WebankCustomerUploadReqDTO.TRS_NAME);\n }\n<|fim▁end|>"

Kwaipilot org

You can add it as a comment.

And our open-source version does not support this feature officially, but we have fine-tuned it with similar data for our internal use cases, and the results have been very good. We also offer customized services for enterprise users. If you’re interested, feel free to message me privately.
My email is [email protected]

OK,thanks

Sign up or log in to comment