Abstractive Pre-trained Models & Results
BART Converted to LongformerEncoderDecoder
Important
The models in this section are the output from the convert_bart_to_longformerencoderdecoder.py script without any gradient updates. This means that they need to be fine-tuned on a long document summarization dataset, such as Arxiv-PubMed, in order to create a model that can summarize long sequences.
The additional position embeddings for these models were initialized by copying the embeddings of the first 512
positions. This initialization is crucial for the model performance (check table 6 in the longformer paper for performance without this initialization).
The models output from the convert_bart_to_longformerencoderdecoder.py
script do not work for long documents without further training. Tables 6 and 11 in the longformer paper suggest that models converted to be able to handle long content may perform well before any additional gradient updates. However, this does not appear to be true for summarization. The converted facebook/bart-large-cnn
model from huggingface/transformers
(aka longformer-encdec-bart-large-cnn-converted
) produces almost random summaries that rarely pertain to the input document. Thus, these models need to be fine-tuned on a long document summarization dataset.
These are huggingface/transformers
models, so they need to be used with the --model_name_or_path
option. They can also be loaded directly in huggingface/transformers
using LEDForConditionalGeneration.from_pretrained()
.
The Google Drive folder containing my contributions to the below models is available at this link.
Name (Shortcut Code) |
Initialized From |
GDrive Download |
---|---|---|
Note
In pervious versions of TransformerSum, this section listed models that could be used with the outdated LED model (using custom versions of huggingface/transformers
and allenai/longformer
). Those models can still be found in this Google Drive Folder.
arXiv-PubMed
Name |
Comments |
Model Download |
Data Download |
---|---|---|---|
led-base-4096-arxiv-pubmed |
None |
Not yet.. |
|
led-large-4096-arxiv-pubmed |
None |
Not yet… |
Not yet.. |
led-base-16384-arxiv-pubmed |
None |
Not yet… |
Not yet.. |
led-large-16384-arxiv-pubmed |
None |
Not yet… |
Not yet.. |
arXiv-PubMed ROUGE Scores
Test set results on the arXiv-PubMed dataset using ROUGE F1.
Name |
ROUGE-1 |
ROUGE-2 |
ROUGE-L |
ROUGE-L-Sum |
---|---|---|---|---|
led-base-4096-arxiv-pubmed |
Not yet… |
Not yet… |
Not yet… |
Not yet… |
led-large-4096-arxiv-pubmed |
Not yet… |
Not yet… |
Not yet… |
Not yet… |
led-base-16384-arxiv-pubmed |
Not yet… |
Not yet… |
Not yet… |
Not yet… |
led-large-16384-arxiv-pubmed |
Not yet… |
Not yet… |
Not yet… |
Not yet… |
Individual ArXiv and PubMed models
The huggingface model hub has two pre-trained models for long text summarization: allenai/led-large-16384-arxiv and patrickvonplaten/led-large-16384-pubmed. These models can be used with pipelines to easily summarize long documents. Please see their model cards (by clicking on their names above) for more information.