Skip to main content

Statistical Data: Data Sets

A compilation of sources of statistical information

Repositories and Portals

  • Inter-university Consortium for Political and Social Research
    ICPSR is a data repository for research in the Social Sciences. It also provides leadership and training in data curation and methods of analysis for a diverse and expanding social science research community.
  • US Government Web Services and XML Data Sources
    USGovXML is an index to publicly available web services and XML data sources that are provided by the US government. USGovXML indexes data sources from all 3 branches of government as well as its boards, commissions, corporations and independent agencies.
  • figshare
    a website where researchers can share their research outputs. It is free to upload content and also free to access
  • Dryad is a curated general-purpose repository that makes the data underlying scientific publications discoverable, freely reusable, and citable. Dryad has integrated data submission for a growing list of journals; submission of data from other publications is also welcome.
  • open data portal
    A showcase of sites using the CKAN data management platform. Numerous governments, organisations and communities around the world use this open source software to share data.
  • datahub
    The free, powerful data management platform from the Open Knowledge Foundation, based on the CKAN data management system.
  • Data Market
    DataMarket helps professionals find and understand data. We bring complex and diverse data together in one place and one format so it can be searched, compared, visualized and shared – across teams, organizations or with the world.
  • Yahoo Labs: Webscope
    A reference library of interesting and scientifically useful datasets for non-commercial use by academics and other scientists.

Selected data sets

  • Home Mortgage Disclosure Act
    HMDA data provide information regarding home mortgage lending activity. The data and reports can be used along with the Census demographic information for data analysis purposes.
  • The Airline Data Project
    The goal of the Airline Data Project is to confirm – and in some cases dispel – the conventional wisdom about the airline industry. The ADP provides a new resource for MIT students, the academic community, financial community, and the news media that we hope will be instrumental in identifying the fundamental cycles in the industry's financial performance and the factors driving those cycles.
  • Pew Internet Research
    Some of the data sets used recently in the Pew Internet and American Life Project.
  • Online Data: Robert Schiller
    Robert Schiller, author of "Irrational Exuberance" shares data from his research on the stock and real estate markets.
  • USA Spending
    Mandated by the Federal Funding Accountability and Transparency Act (Transparency Act). Collecting data about the various types of contracts, grants, loans, and other types of spending in our government will provide a broader picture of the Federal spending processes, and will help to meet the need of greater transparency.
  • National Center for Education Statistics
    The NCES data on public schools is available, including demographics, fiscal data, including revenues and current expenditures.
  • National Center for Health Statistics
    The NCHS offers public-use data files to provide access to the full scope of the data. This allows researchers to manipulate the data in a format appropriate for their analyses. NCHS makes every effort to release data collected through its surveys and data systems in a timely manner.
  • Social Security Administration
    Public use data files of the SSA.
  • U.S. Supreme Court Database
    The U.S. Supreme Court Database has not just helped fill gaps in our knowledge. It is one of those rare creatures in the law and social science world: an invention that has substantially advanced a large area of study, inspiring research by scholars hailing from no fewer than three and as many as seven disciplines.
  • Collaborative Research in Computational Neuroscience
    To enable concerted efforts in understanding the brain experimental data and other resources such as stimuli and analysis tools should be widely shared by researchers all over the world.
  • Enron e-mail
    This data set was collected and prepared by the CALO Project (A Cognitive Assistant that Learns and Organizes). It contains data from about 150 users, mostly senior management of Enron, organized into folders. The corpus contains a total of about 0.5M messages. This data was originally made public, and posted to the web, by the Federal Energy Regulatory Commission during its investigation.