Register
Login
Resources
Docs Blog Datasets Glossary Case Studies Tutorials & Webinars
Product
Data Engine LLMs Platform Enterprise
Pricing Explore
Connect to our Discord channel
General:  medical Task:  named entity recognition Data Domain:  nlp Integration:  dvc git
095918344d
Initial dvc commit, add dataset
3 years ago
c4f4d26d8a
initial commit
3 years ago
4336332bb4
removed tmp files and added relation extraction task preprocessed data
3 years ago
c4f4d26d8a
initial commit
3 years ago
4336332bb4
removed tmp files and added relation extraction task preprocessed data
3 years ago
095918344d
Initial dvc commit, add dataset
3 years ago
7f7a7969b3
Tracking preprocessing folder as an entire directory
3 years ago
e06c7c003f
Update Notebook with bugfix
3 years ago
c4f4d26d8a
initial commit
3 years ago
a4d61031c2
Added try on colab badge
3 years ago
095918344d
Initial dvc commit, add dataset
3 years ago
c4f4d26d8a
initial commit
3 years ago
4f824f622c
Simplify inference interface
3 years ago
4f824f622c
Simplify inference interface
3 years ago
4f824f622c
Simplify inference interface
3 years ago
4336332bb4
removed tmp files and added relation extraction task preprocessed data
3 years ago
5363b9c148
Simplified the DVC pipeline for the project
3 years ago
51fdefb5e8
added requirement
3 years ago
Storage Buckets
Data Pipeline
Legend
DVC Managed File
Git Managed File
Metric
Stage File
External File

README.md

You have to be logged in to leave a comment. Sign In

BioBERT-PyTorch

Try BioBERT on Google Colab: Open In Colab

This repository provides the PyTorch implementation of BioBERT. You can easily use BioBERT with transformers. This project is supported by the members of DMIS-Lab @ Korea University including Jinhyuk Lee, Wonjin Yoon, Minbyul Jeong, Mujeen Sung, and Gangwoo Kim.

Installation

# Install requirements
pip install -r requirements

# Download all the data using dvc
dvc pull

Note that you should also install torch (see download instruction) to use transformers.

Models

We provide following versions of BioBERT in PyTorch (click here to see all). You can use BioBERT in transformers by setting --model_name_or_path as one of them (see example below).

  • dmis-lab/biobert-base-cased-v1.1: BioBERT-Base v1.1 (+ PubMed 1M)
  • dmis-lab/biobert-large-cased-v1.1: BioBERT-Large v1.1 (+ PubMed 1M)
  • dmis-lab/biobert-base-cased-v1.1-mnli: BioBERT-Base v1.1 pre-trained on MNLI
  • dmis-lab/biobert-base-cased-v1.1-squad: BioBERT-Base v1.1 pre-trained on SQuAD

For other versions of BioBERT or for Tensorflow, please see the README in the original BioBERT repository. You can convert any version of BioBERT into PyTorch with this.

Example

For instance, to train BioBERT on the NER dataset (NCBI-disease), run as:

# Pre-process NER datasets
cd named-entity-recognition
./preprocess.sh

# Choose dataset and run
export DATA_DIR=../datasets/NER
export ENTITY=NCBI-disease
python run_ner.py \
    --data_dir ${DATA_DIR}/${ENTITY} \
    --labels ${DATA_DIR}/${ENTITY}/labels.txt \
    --model_name_or_path dmis-lab/biobert-base-cased-v1.1 \
    --output_dir output/${ENTITY} \
    --max_seq_length 128 \
    --num_train_epochs 3 \
    --per_device_train_batch_size 32 \
    --save_steps 1000 \
    --seed 1 \
    --do_train \
    --do_eval \
    --do_predict \
    --overwrite_output_dir

Please see each directory for different examples. Currently, we provide

Most examples are modifed from examples in Hugging Face transformers.

Citation

@article{10.1093/bioinformatics/btz682,
    author = {Lee, Jinhyuk and Yoon, Wonjin and Kim, Sungdong and Kim, Donghyeon and Kim, Sunkyu and So, Chan Ho and Kang, Jaewoo},
    title = "{BioBERT: a pre-trained biomedical language representation model for biomedical text mining}",
    journal = {Bioinformatics},
    year = {2019},
    month = {09},
    issn = {1367-4803},
    doi = {10.1093/bioinformatics/btz682},
    url = {https://doi.org/10.1093/bioinformatics/btz682},
}

License and Disclaimer

Please see the LICENSE file for details. Downloading data indicates your acceptance of our disclaimer.

Contact

For help or issues using BioBERT-PyTorch, please create an issue.

Tip!

Press p or to see the previous file or, n or to see the next file

About

A DagsHub implementation of BioBERT: a pre-trained biomedical language representation model for biomedical text mining

https://arxiv.org/abs/1901.08746
Collaborators 3

Comments

Loading...