General API Reference
This page contains the API reference for modules that contain code used for both the extractive and abstractive summarization components.
Helpers
- class helpers.LabelSmoothingLoss(label_smoothing, tgt_vocab_size, ignore_index=-100)[source]
CrossEntropyLoss with label smoothing, KL-divergence between q_{smoothed ground truth prob.}(w) and p_{prob. computed by model}(w) is minimized. From OpenNMT with modifications: https://github.com/OpenNMT/OpenNMT-py/blob/e8622eb5c6117269bb3accd8eb6f66282b5e67d9/onmt/utils/loss.py#L186
- class helpers.SortishSampler(data, batch_size, pad_token_id)[source]
Go through the text data by order of src length with a bit of randomness. From fastai repo with modifications.
- class helpers.StepCheckpointCallback(step_interval=1000, save_name='model', save_path='.', num_saves_to_keep=5)[source]
- helpers.block_trigrams(candidate, prediction)[source]
Decrease repetition in summaries by checking if a trigram from
predictionexists incandidate- Parameters:
candidate (str) – The string to check for trigrams from
predictionprediction (list) – A list of strings to extract trigrams from
- Returns:
True if overlapping trigrams detected, False otherwise.
- Return type:
bool
- helpers.generic_configure_optimizers(hparams, train_dataloader, params_to_update)[source]
Configure the optimizers. Returns the optimizer and scheduler specified by the values in
hparams. This is a generic function that both the extractive and abstractive scripts use.
- helpers.load_json(json_file)[source]
Load a json file even if it is compressed with gzip.
- Parameters:
json_file (str) – Path to json file
- Returns:
(documents, file_path), loaded json and path to file
- Return type:
tuple
- helpers.pad(data, pad_id, width=None, pad_on_left=False, nearest_multiple_of=False)[source]
Pad
datawithpad_idtowidthon the right by default but ifpad_on_leftthen left.
- helpers.pad_tensors(tensors, pad_id=0, width=None, pad_on_left=False, nearest_multiple_of=False)[source]
Pad
tensorswithpad_idtowidthon the right by default but ifpad_on_leftthen left.
- helpers.test_rouge(temp_dir, cand, ref)[source]
Compute ROUGE scores using the official ROUGE 1.5.5 package. This function uses the
pyrougepython module to interface with the office ROUGE script. There should be a “<q>” token between each sentence in thecandandreffiles.pyrougesplits sentences based on newlines but we cannot store all the summaries easily in a single text file if there is a newline between each sentence since newlines mark new summaries. Thus, the “<q>” token is used in the text files and is converted to a newline in this function. Using “<q>” instead of\\nalso makes it easier to store the ground-truth summaries in theconvert_to_extractive.pyscript.- Parameters:
temp_dir (str) – A temporary folder to store files for input to the ROUGE script.
cand (str) – The path to the file containing one candidate summary per line with “<q>” tokens in between each sentence.
ref (str) – The path to the file containing one ground-truth/gold summary per line with “<q>” tokens in between each sentence.
- Returns:
Results from the ROUGE script as a python dictionary.
- Return type:
dict