emrQA

A Clinical Question Answering Dataset

About

emrQA is a clinical question answering dataset that contains questions (along with question paraphrases, logical forms and answers) posed by physicians against clinical notes in electronic medical records. For e.g., Question: How was the patient's extensive liver metastases diagnosed? Paraphrase: What diagnosis was used for the patient's extensive liver metastases? Logical Form: {LabEvent (x) [date=x, result=x] OR ProcedureEvent (x) [date=x, result=x] OR VitalEvent (x) [date=x, result=x]} reveals ConditionEvent (|problem|) Answer: An abdominal and pelvic ct scan with iv contrast

For more details about emrQA, please refer to the paper:

Dataset

emrQA has 1 million question-logical forms and 400,000+ question answer evidence pairs.

Please visit our GitHub repository to create the dataset from i2b2 NER dataset:

Submission

To submit your model, please follow the instructions in the GitHub repository.

Citation

If you use emrQA in your research, please cite our paper by:

@article{pampari2018emrqa,
  title={emrQA: A large corpus for question answering on electronic medical records},
  author={Pampari, Anusri and Raghavan, Preethi and Liang, Jennifer and Peng, Jian},
  journal={arXiv preprint arXiv:1809.00732},
  year={2018}
}
      
Leaderboard
Model Code Exact Match (%) F1-score (%)
1
Apr 13, 2020
Baseline Model
University of Massachusetts - Amherst
(Paper et al. 2020)
00.00 00.19