VOOZH about

URL: https://huggingface.co/datasets/lamblamb/pile_of_law_subset

⇱ lamblamb/pile_of_law_subset · Datasets at Hugging Face


Dataset Viewer (First 5GB)
Auto-converted to Parquet Duplicate

Subset of original Pile of Law dataset.

Contains

  1. US Code
  2. Congressional hearings
  3. SCOTUS oral arguments
  4. Code of Federal Regulations
  5. State codes
  6. FTC advisory opinions
  7. SEC proceedings

All records have a created_timestamp field, which indicates the time that the given article was created. Except for state code, which contains only a year, all datasets contain a full date in any format that is recognized by pandas.Timestamp with no additional arguments.

@misc{hendersonkrass2022pileoflaw,
 url = {https://arxiv.org/abs/2207.00220},
 author = {Henderson*, Peter and Krass*, Mark S. and Zheng, Lucia and Guha, Neel and Manning, Christopher D. and Jurafsky, Dan and Ho, Daniel E.},
 title = {Pile of Law: Learning Responsible Data Filtering from the Law and a 256GB Open-Source Legal Dataset},
 publisher = {arXiv},
 year = {2022}
}
Downloads last month
177

Models trained or fine-tuned on lamblamb/pile_of_law_subset

Paper for lamblamb/pile_of_law_subset