Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
4b6f97ad22
Initial commit
1 year ago
ec93c952fc
update readme automation
1 year ago

README.md

You have to be logged in to leave a comment. Sign In

REDASA COVID-19 Open Data

Stream data with DDA:

from dagshub.streaming import DagsHubFilesystem

fs = DagsHubFilesystem(".", repo_url="https://dagshub.com/DagsHub-Datasets/redasa-covid-data-dataset")

fs.listdir("s3://pansurg-curation-raw-open-data")

Description:

The REaltime DAta Synthesis and Analysis (REDASA) COVID-19 snapshot contains the output of the curation protocol produced by our curator community. A detailed description can be found in our paper. The first S3 bucket listed in Resources contains a large collection of medical documents in text format extracted from the CORD-19 dataset, plus other sources deemed relevant by the REDASA consortium. The second S3 bucket contains a series of documents surfaced by Amazon Kendra that were considered relevant for each medical question asked. The final S3 bucket contains the GroundTruth annotations created by our curator community.

Contact:

The REaltime DAta Synthesis and Analysis (REDASA) COVID-19 snapshot contains the output of the curation protocol produced by our curator community. A detailed description can be found in our paper. The first S3 bucket listed in Resources contains a large collection of medical documents in text format extracted from the CORD-19 dataset, plus other sources deemed relevant by the REDASA consortium. The second S3 bucket contains a series of documents surfaced by Amazon Kendra that were considered relevant for each medical question asked. The final S3 bucket contains the GroundTruth annotations created by our curator community.

Update Frequency:

Yearly updates

Managed By:

REDASA Consortium, Imperial College London, UK

Resources:

  1. resource:

    • Description: This is the raw data repository containing a common crawl of CORD-19 papers and other sources identified by the REDASA Project.
    • ARN: arn:aws:s3:::pansurg-curation-raw-open-data
    • Region: eu-west-2
    • Type: S3 Bucket
  2. resource:

    • Description: For all the questions curated during the REDASA project, we created a Kendra index. The documents available in this S3 bucket were surfaced by the Kendra index as being relevant to the research medical question.
    • ARN: arn:aws:s3:::pansurg-curation-workflo-kendraqueryresults50d0eb-open-data
    • Region: eu-west-2
    • Type: S3 Bucket
  3. resource:

    • Description: An S3 bucket that contains the final curation data in GroundTruth format
    • ARN: arn:aws:s3:::pansurg-curation-final-curations-open-data
    • Region: eu-west-2
    • Type: S3 Bucket

Tags:

aws-pds, COVID-19, coronavirus, life sciences, information retrieval, natural language processing, text analysis

Tools & Applications:

  1. tools & applications:

Publication:

  1. publication:
    • Title: Using a Secure, Continually Updating, Web Source Processing Pipeline to Support the Real-Time Data Synthesis and Analysis of Scientific Literature: Development and Validation Study
    • URL: https://www.jmir.org/2021/5/e25714
    • AuthorName: Uddhav Vaghela, Simon Rabinowicz, Paris Bratsos, Guy Martin, Epameinondas Fritzilas, et al.
Tip!

Press p or to see the previous file or, n or to see the next file

About

redasa-covid-data-dataset is originate from the Registry of Open Data on AWS

Collaborators 5

Comments

Loading...