Main Script

The same script is used to train, validate, and test both extractive and abstractive models. The --mode argument switches between using ExtractiveSummarizer and AbstractiveSummarizer, with ExtractiveSummarizer as the default.
Note
The below --help output only shows the generic commands that can be used for both extractive and abstractive models. Run the command with the --mode set to see the commands specific to each summarization technique. The --help output for each is also in this documentation: Extractive and Abstractive
All training arguments can be found in the pytorch_lightning trainer documentation.
Output of python main.py --help:
usage: main.py [-h] [--mode {extractive,abstractive}]
                [--default_root_dir DEFAULT_ROOT_DIR]
                [--weights_save_path WEIGHTS_SAVE_PATH] [--learning_rate LEARNING_RATE]
                [--min_epochs MIN_EPOCHS] [--max_epochs MAX_EPOCHS]
                [--min_steps MIN_STEPS] [--max_steps MAX_STEPS]
                [--accumulate_grad_batches ACCUMULATE_GRAD_BATCHES]
                [--check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH] [--gpus GPUS]
                [--gradient_clip_val GRADIENT_CLIP_VAL] [--overfit_batches OVERFIT_BATCHES]
                [--train_percent_check TRAIN_PERCENT_CHECK]
                [--val_percent_check VAL_PERCENT_CHECK]
                [--test_percent_check TEST_PERCENT_CHECK] [--amp_level AMP_LEVEL]
                [--precision PRECISION] [--seed SEED] [--profiler]
                [--progress_bar_refresh_rate PROGRESS_BAR_REFRESH_RATE]
                [--num_sanity_val_steps NUM_SANITY_VAL_STEPS]
                [--use_logger {tensorboard,wandb}] [--wandb_project WANDB_PROJECT]
                [--gradient_checkpointing] [--do_train] [--do_test]
                [--load_weights LOAD_WEIGHTS]
                [--load_from_checkpoint LOAD_FROM_CHECKPOINT]
                [--resume_from_checkpoint RESUME_FROM_CHECKPOINT]
                [--use_custom_checkpoint_callback]
                [--custom_checkpoint_every_n CUSTOM_CHECKPOINT_EVERY_N]
                [--no_wandb_logger_log_model]
                [--auto_scale_batch_size AUTO_SCALE_BATCH_SIZE] [--lr_find]
                [--adam_epsilon ADAM_EPSILON] [--optimizer_type OPTIMIZER_TYPE]
                [--ranger-k RANGER_K] [--warmup_steps WARMUP_STEPS]
                [--use_scheduler USE_SCHEDULER] [--end_learning_rate END_LEARNING_RATE]
                [--weight_decay WEIGHT_DECAY] [-l {DEBUG,INFO,WARNING,ERROR,CRITICAL}]

    optional arguments:
    -h, --help            show this help message and exit
    --mode {extractive,abstractive}
                            Extractive or abstractive summarization training. Default is
                            'extractive'.
    --default_root_dir DEFAULT_ROOT_DIR
                            Default path for logs and weights. To use this option with the
                            `wandb` logger specify the `--no_wandb_logger_log_model` option.
    --weights_save_path WEIGHTS_SAVE_PATH
                            Where to save weights if specified. Will override
                            `--default_root_dir` for checkpoints only. Use this if for
                            whatever reason you need the checkpoints stored in a different
                            place than the logs written in `--default_root_dir`. This option
                            will override the save locations when using a custom checkpoint
                            callback, such as those created when using
                            `--use_custom_checkpoint_callback or
                            `--custom_checkpoint_every_n`. If you are using the `wandb`
                            logger, then you must also set `--no_wandb_logger_log_model` when
                            using this option. Model weights are saved with the wandb logs to
                            be uploaded to wandb.ai by default. Setting this option without
                            setting `--no_wandb_logger_log_model` effectively creates two
                            save paths, which will crash the script.
    --learning_rate LEARNING_RATE
                            The initial learning rate for the optimizer.
    --min_epochs MIN_EPOCHS
                            Limits training to a minimum number of epochs
    --max_epochs MAX_EPOCHS
                            Limits training to a max number number of epochs
    --min_steps MIN_STEPS
                            Limits training to a minimum number number of steps
    --max_steps MAX_STEPS
                            Limits training to a max number number of steps
    --accumulate_grad_batches ACCUMULATE_GRAD_BATCHES
                            Accumulates grads every k batches. A single step is one gradient
                            accumulation cycle, so setting this value to 2 will cause 2
                            batches to be processed for each step.
    --check_val_every_n_epoch CHECK_VAL_EVERY_N_EPOCH
                            Check val every n train epochs.
    --gpus GPUS           Number of GPUs to train on or Which GPUs to train on. (default:
                            -1 (all gpus))
    --gradient_clip_val GRADIENT_CLIP_VAL
                            Gradient clipping value
    --overfit_batches OVERFIT_BATCHES
                            Uses this much data of all datasets (training, validation, test).
                            Useful for quickly debugging or trying to overfit on purpose.
    --train_percent_check TRAIN_PERCENT_CHECK
                            How much of training dataset to check. Useful when debugging or
                            testing something that happens at the end of an epoch.
    --val_percent_check VAL_PERCENT_CHECK
                            How much of validation dataset to check. Useful when debugging or
                            testing something that happens at the end of an epoch.
    --test_percent_check TEST_PERCENT_CHECK
                            How much of test dataset to check.
    --amp_level AMP_LEVEL
                            The optimization level to use (O1, O2, etc…) for 16-bit GPU
                            precision (using NVIDIA apex under the hood).
    --precision PRECISION
                            Full precision (32), half precision (16). Can be used on CPU, GPU
                            or TPUs.
    --seed SEED           Seed for reproducible results. Can negatively impact performace
                            in some cases.
    --profiler            To profile individual steps during training and assist in
                            identifying bottlenecks.
    --progress_bar_refresh_rate PROGRESS_BAR_REFRESH_RATE
                            How often to refresh progress bar (in steps). In notebooks,
                            faster refresh rates (lower number) is known to crash them
                            because of their screen refresh rates, so raise it to 50 or more.
    --num_sanity_val_steps NUM_SANITY_VAL_STEPS
                            Sanity check runs n batches of val before starting the training
                            routine. This catches any bugs in your validation without having
                            to wait for the first validation check.
    --use_logger {tensorboard,wandb}
                            Which program to use for logging. If `wandb` is chosen then model
                            weights will automatically be uploaded to wandb.ai.
    --wandb_project WANDB_PROJECT
                            The wandb project to save training runs to if `--use_logger` is
                            set to `wandb`.
    --gradient_checkpointing
                            Enable gradient checkpointing (save memory at the expense of a
                            slower backward pass) for the word embedding model. More info: ht
                            tps://github.com/huggingface/transformers/pull/4659#issue-4248418
                            71
    --do_train            Run the training procedure.
    --do_test             Run the testing procedure.
    --load_weights LOAD_WEIGHTS
                            Loads the model weights from a given checkpoint
    --load_from_checkpoint LOAD_FROM_CHECKPOINT
                            Loads the model weights and hyperparameters from a given
                            checkpoint.
    --resume_from_checkpoint RESUME_FROM_CHECKPOINT
                            To resume training from a specific checkpoint pass in the path
                            here. Automatically restores model, epoch, step, LR schedulers,
                            apex, etc...
    --use_custom_checkpoint_callback
                            Use the custom checkpointing callback specified in `main()` by
                            `args.checkpoint_callback`. By default this custom callback saves
                            the model every epoch and never deletes the saved weights files.
                            You can change the save path by setting the `--weights_save_path`
                            option.
    --custom_checkpoint_every_n CUSTOM_CHECKPOINT_EVERY_N
                            The number of steps between additional checkpoints. By default
                            checkpoints are saved every epoch. Setting this value will save
                            them every epoch and every N steps. This does not use the same
                            callback as `--use_custom_checkpoint_callback` but instead uses a
                            different class called `StepCheckpointCallback`. You can change
                            the save path by setting the `--weights_save_path` option.
    --no_wandb_logger_log_model
                            Only applies when using the `wandb` logger. Set this argument to
                            NOT save checkpoints in wandb directory to upload to W&B servers.
    --auto_scale_batch_size AUTO_SCALE_BATCH_SIZE
                            Auto scaling of batch size may be enabled to find the largest
                            batch size that fits into memory. Larger batch size often yields
                            better estimates of gradients, but may also result in longer
                            training time. Currently, this feature supports two modes 'power'
                            scaling and 'binsearch' scaling. In 'power' scaling, starting
                            from a batch size of 1 keeps doubling the batch size until an
                            out-of-memory (OOM) error is encountered. Setting the argument to
                            'binsearch' continues to finetune the batch size by performing a
                            binary search. 'binsearch' is the recommended option.
    --lr_find             Runs a learning rate finder algorithm (see
                            https://arxiv.org/abs/1506.01186) before any training, to find
                            optimal initial learning rate.
    --adam_epsilon ADAM_EPSILON
                            Epsilon for Adam optimizer.
    --optimizer_type OPTIMIZER_TYPE
                            Which optimizer to use: `adamw` (default), `ranger`, `qhadam`,
                            `radam`, or `adabound`.
    --ranger-k RANGER_K   Ranger (LookAhead) optimizer k value (default: 6). LookAhead
                            keeps a single extra copy of the weights, then lets the
                            internalized ‘faster’ optimizer (for Ranger, that’s RAdam)
                            explore for 5 or 6 batches. The batch interval is specified via
                            the k parameter.
    --warmup_steps WARMUP_STEPS
                            Linear warmup over warmup_steps. Only active if `--use_scheduler`
                            is set to linear.
    --use_scheduler USE_SCHEDULER
                            Three options: 1. `linear`: Use a linear schedule that inceases
                            linearly over `--warmup_steps` to `--learning_rate` then
                            decreases linearly for the rest of the training process. 2.
                            `onecycle`: Use the one cycle policy with a maximum learning rate
                            of `--learning_rate`. (default: False, don't use any scheduler)
                            3. `poly`: polynomial learning rate decay from `--learning_rate`
                            to `--end_learning_rate`
    --end_learning_rate END_LEARNING_RATE
                            The ending learning rate when `--use_scheduler` is poly.
    --weight_decay WEIGHT_DECAY
    -l {DEBUG,INFO,WARNING,ERROR,CRITICAL}, --log {DEBUG,INFO,WARNING,ERROR,CRITICAL}
                            Set the logging level (default: 'Info').