img

c4f4d26d8a

initial commit

4 years ago

README.md

c4f4d26d8a

initial commit

4 years ago

load_embedding.py

c4f4d26d8a

initial commit

4 years ago

pubmed_entity_2048.txt

c4f4d26d8a

initial commit

4 years ago

run_embedding.py

c4f4d26d8a

initial commit

4 years ago

utils_embedding.py

c4f4d26d8a

initial commit

4 years ago

You have to be logged in to leave a comment.

BioBERT Embedding

To get contextualized embeddings from BioBERT-v1.1 (base), run the command below. Note that as the output is saved in hdf5 format, you need to install the h5py package (pip install h5py) first. We also provide a sample input text (pubmed_entity_2048.txt) which contains biomedical concepts for each line.

export MAX_LENGTH=384
export DATA_PATH=pubmed_entity_2048.txt
export OUTPUT_PATH=pubmed_entity_2048.h5
export BATCH_SIZE=64

python run_embedding.py \
    --model_name_or_path dmis-lab/biobert-base-cased-v1.1 \
    --max_seq_length  ${MAX_LENGTH} \
    --data_path ${DATA_PATH} \
    --output_path ${OUTPUT_PATH} \
    --batch_size ${BATCH_SIZE} \
    --pooling mean

Required Arguments

--pooling
- none: embeddings of a sequence of tokens
- first: embedding of the first token (i.e., embedding at [CLS])
- mean: embedding of mean of token embeddings
- sum: embedding of sum of token embeddings

Load Embeddings

export DATA_PATH=pubmed_entity_2048.txt
export OUTPUT_PATH=pubmed_entity_2048.h5

python load_embedding.py \
    --inputtext_path ${DATA_PATH}\
    --indexed_path ${OUTPUT_PATH}

Result

The number of keys in h5: 2048
entity_name = Lohmann Selected Leghorn
embedding = [2.77513593e-01  2.03759596e-02  1.59252986e-01 ...  7.65920877e-02  2.49284402e-01 -1.48969248e-01]

Visualization

The embedding of different biomedical concepts (obtained from here) are visualized below with T-SNE. Each of different colors or shapes refers to the unique biomedical concept (having multiple synonyms).

Contact

For help or issues using BioBERT-PyTorch, please create an issue and tag @mjeensung.

Tip!

Press p or to see the previous file or, n or to see the next file

README.md

BioBERT Embedding

Required Arguments

Load Embeddings

Result

Visualization

Contact

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Dean / BioBERT-DAGsHub

README.md

BioBERT Embedding

Required Arguments

Load Embeddings

Result

Visualization

Contact

Comments

Use AWS S3 as storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Use Google Cloud Storage!

Specify your Google Storage bucket

Service Account Key

Congratulations!

Use Azure Cloud Storage!

Specify your Azure Storage bucket

Access key (If needed)

Congratulations!

Use any S3 compatible storage!

Specify your S3 bucket

Access key (If needed)

Congratulations!

Dean
/
BioBERT-DAGsHub