.. _pretrained_ext: Extractive Pre-trained Models & Results ======================================= The recommended model to use is ``distilroberta-base-ext-sum`` because of its fast performance, relatively low number of parameters, and good performance. Notes ----- The distil* models are of special significance. Distil* is a class of compressed models that started with `DistilBERT `__. DistilBERT stands for Distillated-BERT. DistilBERT is a small, fast, cheap and light Transformer model based on Bert architecture. It has 40% less parameters than ``bert-base-uncased``, runs 60% faster while preserving 99% of BERT's performances as measured on the GLUE language understanding benchmark. DistilBERT is a smaller Transformer model that bears a lot of similarities with the original BERT model while being lighter, smaller and faster to run. DistilRoBERTa reaches 95% of RoBERTa-base's performance on GLUE and is twice as fast as RoBERTa while being 35% smaller. More info at `huggingface/transformers `__. The remarkable performance to size ratio of the distil* models can be transferred to summarization. ``distilroberta`` is recommended over ``distilbert`` because of the architecture improvements that the original RoBERTa brought over the original BERT. Essentially, ``distilroberta`` is more modern than ``distilbert``. `MobileBERT `_ is similar to ``distilbert`` in that it is a smaller version of BERT that achieves amazing performance at a very small size. `According to the authors `__, MobileBERT is *2.64x smaller and 2.45x faster* than DistilBERT. DistilBERT successfully halves the depth of BERT model by knowledge distillation in the pre-training stage and an optional fine-tuning stage. MobileBERT only uses knowledge transfer in the pre-training stage and does not require a fine-tuned teacher or data augmentation in the down-stream tasks. DistilBERT compresses BERT by reducing its depth, while MobileBERT compresses BERT by reducing its width, which has been shown to be more effective. MobileBERT usually needs a larger learning rate and more training epochs in fine-tuning than the original BERT. .. important:: Interactive charts, graphs, raw data, run commands, hyperparameter choices, and more for all trained models are publicly available on the `TransformerSum Weights & Biases page `__. You can download the raw data for each model on this site, or `download an overview as a CSV <../_static/summarization-model-experiments-raw-data.csv>`__. Please open an `issue `__ if you have questions about these models. Additionally, all of the models on this page were trained completely for free using Tesla P100-PCIE-16GB GPUs on `Google Colaboratory `_. Those that took over 12 hours to train were split into multiple training sessions since ``pytorch_lightning`` enables easy resuming with the ``--resume_from_checkpoint`` argument. .. _pretrained_ext_cnn_dm: CNN/DM ------ +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Comments | Model Download | Data Download | +=================================+======================+==========================================================================================================================+=============================================================================================================================================================+ | distilbert-base-uncased-ext-sum | None | `Model `__ | `CNN/DM Bert Uncased `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | distilroberta-base-ext-sum | None | `Model `__ | `CNN/DM Roberta `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | bert-base-uncased-ext-sum | None | `Model `__ | `CNN/DM Bert Uncased `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | roberta-base-ext-sum | None | `Model `__ | `CNN/DM Roberta `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | bert-large-uncased-ext-sum | None | Not yet... | `CNN/DM Bert Uncased `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | roberta-large-ext-sum | None | Not yet... | `CNN/DM Roberta `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | longformer-base-4096-ext-sum | None | Not yet... | `CNN/DM Longformer `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ | mobilebert-uncased-ext-sum | Trained with lr=8e-5 | `Model `__ | `CNN/DM Bert Uncased `_ | +---------------------------------+----------------------+--------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------+ CNN/DM ROUGE Scores ^^^^^^^^^^^^^^^^^^^ Test set results on the CNN/DailyMail dataset using ROUGE F\ :sub:`1`\ . +---------------------------------+------------+------------+------------+-------------+ | Name | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-L-Sum | +=================================+============+============+============+=============+ | distilbert-base-uncased-ext-sum | 42.71 | 19.91 | 27.52 | 39.18 | +---------------------------------+------------+------------+------------+-------------+ | distilroberta-base-ext-sum | 42.87 | 20.02 | 27.46 | 39.31 | +---------------------------------+------------+------------+------------+-------------+ | bert-base-uncased-ext-sum | 42.78 | 19.83 | 27.43 | 39.18 | +---------------------------------+------------+------------+------------+-------------+ | roberta-base-ext-sum | 43.24 | 20.36 | 27.64 | 39.65 | +---------------------------------+------------+------------+------------+-------------+ | bert-large-uncased-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | roberta-large-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | longformer-base-4096-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | mobilebert-uncased-ext-sum | 42.01 | 19.31 | 26.89 | 38.53 | +---------------------------------+------------+------------+------------+-------------+ .. note:: Currently, ``distilbert`` beats ``bert-base-uncased`` by 1.0014% (``(42.71/42.78+19.91/19.83+27.52/27.43+39.18/39.18)/4=1.0014197729882865``). Since ``bert-base-uncased`` has more parameters than ``distilbert``, this is unusual and is likely a tuning issue. This suggests that tuning the hyperparameters of ``bert-base-uncased`` can improve its performance. ``distilroberta`` matches 92.7% of the performance of ``roberta-base`` (``(42.87/43.24+20.02/20.36+27.46/27.64+29.31/39.65)/4=0.9268623888753363``). .. important:: ``mobilebert-uncased-ext-sum`` achieves 96.59% (``(42.01/43.25+19.31/20.24+38.53/39.63)/3``) of the performance of `BertSum `_ while containing 4.45 times (``109483009/24582401``) fewer parameters. It achieves 94.06% (``(42.01/44.41+19.31/20.86+38.53/40.55)/3``) of the performance of `MatchSum (Zhong et al., 2020) `_, the current extractive state-of-the-art. CNN/DM Training Times and Model Sizes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------------+-------------+------------+ | Name | Time | Model Size | +=================================+=============+============+ | distilbert-base-uncased-ext-sum | 6h 22m 32s | 796.4MB | +---------------------------------+-------------+------------+ | distilroberta-base-ext-sum | 6h 21m 37s | 980.8MB | +---------------------------------+-------------+------------+ | bert-base-uncased-ext-sum | 12h 51m 17s | 1.3GB | +---------------------------------+-------------+------------+ | roberta-base-ext-sum | 13h 7m 3s | 1.5GB | +---------------------------------+-------------+------------+ | bert-large-uncased-ext-sum | Not yet... | Not yet... | +---------------------------------+-------------+------------+ | roberta-large-ext-sum | Not yet... | Not yet... | +---------------------------------+-------------+------------+ | longformer-base-4096-ext-sum | Not yet... | Not yet... | +---------------------------------+-------------+------------+ | mobilebert-uncased-ext-sum | 8h 26m 32s | 295.6MB | +---------------------------------+-------------+------------+ .. important:: ``distilroberta-base-ext-sum`` trains in about 6.5 hours on 1 P100-PCIE-16GB GPU, while `MatchSum `_, the current state-of-the-art in extractive summarization on CNN/DM, takes 30 hours on 8 Tesla-V100-16G GPUs to train. If a V100 is about 2x as powerful as a P100, then it would take 480 hours (``30*8*2``) to train MatchSum on one P100. This simplistic approximation suggests that it takes about 74x (``480/6.5``) more time to train MatchSum than ``distilroberta-base-ext-sum``. WikiHow ------- +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Comments | Model Download | Data Download | +=================================+======================+=========================================================================================================================+============================================================================================================================================================+ | distilbert-base-uncased-ext-sum | None | `Model `__ | `WikiHow Bert Uncased `_ | +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | distilroberta-base-ext-sum | None | `Model `__ | `WikiHow Roberta `_ | +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | bert-base-uncased-ext-sum | None | `Model `__ | `WikiHow Bert Uncased `_ | +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | roberta-base-ext-sum | None | `Model `__ | `WikiHow Roberta `_ | +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | bert-large-uncased-ext-sum | None | Not yet... | `WikiHow Bert Uncased `_ | +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | roberta-large-ext-sum | None | Not yet... | `WikiHow Roberta `_ | +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ | mobilebert-uncased-ext-sum | Trained with lr=8e-5 | `Model `__ | `WikiHow Bert Uncased `_ | +---------------------------------+----------------------+-------------------------------------------------------------------------------------------------------------------------+------------------------------------------------------------------------------------------------------------------------------------------------------------+ WikiHow ROUGE Scores ^^^^^^^^^^^^^^^^^^^^ Test set results on the WikiHow dataset using ROUGE F\ :sub:`1`\ . +---------------------------------+------------+------------+------------+-------------+ | Name | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-L-Sum | +=================================+============+============+============+=============+ | distilbert-base-uncased-ext-sum | 30.69 | 8.65 | 19.13 | 28.58 | +---------------------------------+------------+------------+------------+-------------+ | distilroberta-base-ext-sum | 31.07 | 8.96 | 19.34 | 28.95 | +---------------------------------+------------+------------+------------+-------------+ | bert-base-uncased-ext-sum | 30.68 | 08.67 | 19.16 | 28.59 | +---------------------------------+------------+------------+------------+-------------+ | roberta-base-ext-sum | 31.26 | 09.09 | 19.47 | 29.14 | +---------------------------------+------------+------------+------------+-------------+ | bert-large-uncased-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | roberta-large-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | mobilebert-uncased-ext-sum | 30.72 | 8.78 | 19.18 | 28.59 | +---------------------------------+------------+------------+------------+-------------+ .. note:: These are the results of an extractive model, which means they are fairly good because they come close to abstractive models. The R1/R2/RL-Sum results of a base transformer model from the `PEGASUS paper `_ are 32.48/10.53/23.86. The net difference from ``distilroberta-base-ext-sum`` is +1.41/+1.57/-5.09. Compared to the **abstractive** SOTA prior to PEGASUS, which was 28.53/9.23/26.54, ``distilroberta-base-ext-sum`` performs +2.54/-0.27/+2.41. However, the base PEGASUS model obtains scores of 36.58/15.64/30.01, which are much better than ``distilroberta-base-ext-sum``, as one would expect. WikiHow Training Times and Model Sizes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------------+------------+------------+ | Name | Time | Model Size | +=================================+============+============+ | distilbert-base-uncased-ext-sum | 3h 42m 12s | 796.4MB | +---------------------------------+------------+------------+ | distilroberta-base-ext-sum | 4h 27m 23s | 980.8MB | +---------------------------------+------------+------------+ | bert-base-uncased-ext-sum | 7h 29m 06s | 1.3GB | +---------------------------------+------------+------------+ | roberta-base-ext-sum | 7h 35m 59s | 1.5GB | +---------------------------------+------------+------------+ | bert-large-uncased-ext-sum | Not yet... | Not yet... | +---------------------------------+------------+------------+ | roberta-large-ext-sum | Not yet... | Not yet... | +---------------------------------+------------+------------+ | mobilebert-uncased-ext-sum | 4h 22m 19s | 295.6MB | +---------------------------------+------------+------------+ arXiv-PubMed ------------ +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | Name | Comments | Model Download | Data Download | +=================================+==========+==============================================================================================================================+===============================================================================================================================================================================+ | distilbert-base-uncased-ext-sum | None | `Model `__ | `arXiv-PubMed Bert Uncased `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | distilroberta-base-ext-sum | None | `Model `__ | `arXiv-PubMed Roberta `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | bert-base-uncased-ext-sum | None | `Model `__ | `arXiv-PubMed Bert Uncased `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | roberta-base-ext-sum | None | `Model `__ | `arXiv-PubMed Roberta `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | bert-large-uncased-ext-sum | None | Not yet... | `arXiv-PubMed Bert Uncased `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | roberta-large-ext-sum | None | Not yet... | `arXiv-PubMed Roberta `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | longformer-base-4096-ext-sum | None | Not yet... | `arXiv-PubMed Longformer `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ | mobilebert-uncased-ext-sum | None | `Model `__ | `arXiv-PubMed Bert Uncased `_ | +---------------------------------+----------+------------------------------------------------------------------------------------------------------------------------------+-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+ arXiv-PubMed ROUGE Scores ^^^^^^^^^^^^^^^^^^^^^^^^^ Test set results on the arXiv-PubMed dataset using ROUGE F\ :sub:`1`\ . +---------------------------------+------------+------------+------------+-------------+ | Name | ROUGE-1 | ROUGE-2 | ROUGE-L | ROUGE-L-Sum | +=================================+============+============+============+=============+ | distilbert-base-uncased-ext-sum | 34.93 | 12.21 | 19.62 | 31.00 | +---------------------------------+------------+------------+------------+-------------+ | distilroberta-base-ext-sum | 34.70 | 12.16 | 19.52 | 30.82 | +---------------------------------+------------+------------+------------+-------------+ | bert-base-uncased-ext-sum | 34.80 | 12.26 | 19.67 | 30.92 | +---------------------------------+------------+------------+------------+-------------+ | roberta-base-ext-sum | 34.81 | 12.26 | 19.65 | 30.91 | +---------------------------------+------------+------------+------------+-------------+ | bert-large-uncased-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | roberta-large-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | longformer-base-4096-ext-sum | Not yet... | Not yet... | Not yet... | Not yet... | +---------------------------------+------------+------------+------------+-------------+ | mobilebert-uncased-ext-sum | 33.97 | 11.74 | 19.63 | 30.19 | +---------------------------------+------------+------------+------------+-------------+ .. note:: These are the results of an extractive model, which means they are fairly good because they come close to abstractive models. The R1/R2/RL-Sum results of a base transformer model from the `PEGASUS paper `_ are 34.79/7.69/19.51 (average of 35.63/7.95/20.00 (arXiv) and 33.94/7.43/19.02 (PubMed)). The net difference from ``distilroberta-base-ext-sum`` is +0.09/-4.47/-11.31. Compared to the **abstractive** SOTA prior to PEGASUS, which was 41.09/14.93/23.57 (average of 41.59/14.26/23.55 (arXiv) and 40.59/15.59/23.59 (PubMed)), ``distilroberta-base-ext-sum`` performs -6.39/-2.77/+7.25. However, the base PEGASUS model obtains scores of 37.39/12.66/23.87 (average of 34.81/10.16/22.50 (arXiv) and 39.98/15.15/25.23 (PubMed)). The large model obtains scores of 45.10/18.59/26.75 (average of 44.70/17.27/25.80 (arXiv) and 45.49/19.90/27.69 (PubMed)) which are much better than ``distilroberta-base-ext-sum``, as one would expect. arXiv-PubMed Training Times and Model Sizes ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ +---------------------------------+-------------+------------+ | Name | Time | Model Size | +=================================+=============+============+ | distilbert-base-uncased-ext-sum | 06h 46m 0s | 796.4MB | +---------------------------------+-------------+------------+ | distilroberta-base-ext-sum | 06h 33m 58s | 980.8MB | +---------------------------------+-------------+------------+ | bert-base-uncased-ext-sum | 14h 40m 10s | 1.3GB | +---------------------------------+-------------+------------+ | roberta-base-ext-sum | 14h 39m 43s | 1.5GB | +---------------------------------+-------------+------------+ | bert-large-uncased-ext-sum | Not yet... | Not yet... | +---------------------------------+-------------+------------+ | roberta-large-ext-sum | Not yet... | Not yet... | +---------------------------------+-------------+------------+ | longformer-base-4096-ext-sum | Not yet... | Not yet... | +---------------------------------+-------------+------------+ | mobilebert-uncased-ext-sum | 09h 5m 45s | 295.6MB | +---------------------------------+-------------+------------+