Submitted by Sea-Connection462 t3_103b1ck in MachineLearning
Legal datasets are extremely expensive because lawyers are, and this has bottlenecked legal NLP.
To address this, we release the Merger Agreement Understand Dataset (MAUD), with over 39,000 multiple-choice reading comprehension examples for 152 merger agreements that have been manually labeled by legal experts. The dataset was created with the help of the American Bar Association; without their help the dataset would have cost over $5,000,000 to create.
MAUD has substantial room for improvement and can could serve as a research challenge for NLP researchers without any legal background.
Dataset and Baselines: https://github.com/TheAtticusProject/maud/
lebeaudiable t1_j2yhty4 wrote
Attorney here just getting into NLP. What should I be doing to take advantage of this intersection? I am going to use this dataset and explore more.