Workflow

papermill.engines

Engines to perform different roles

class papermill.engines.Engine

Bases: object

Base class for engines.

Other specific engine classes should inherit and implement the execute_managed_notebook method.

Defines execute_notebook method which is used to correctly setup the NotebookExecutionManager object for engines to interact against.

classmethod execute_managed_notebook(nb_man, kernel_name, **kwargs)

An abstract method where implementation will be defined in a subclass.

classmethod execute_notebook(nb, kernel_name, output_path=None, progress_bar=True, log_output=False, autosave_cell_every=30, **kwargs)

A wrapper to handle notebook execution tasks.

Wraps the notebook object in a NotebookExecutionManager in order to track execution state in a uniform manner. This is meant to help simplify engine implementations. This allows a developer to just focus on iterating and executing the cell contents.

class papermill.engines.NBClientEngine

Bases: papermill.engines.Engine

A notebook engine representing an nbclient process.

This can execute a notebook document and update the nb_man.nb object with the results.

classmethod execute_managed_notebook(nb_man, kernel_name, log_output=False, stdout_file=None, stderr_file=None, start_timeout=60, execution_timeout=None, **kwargs)

Performs the actual execution of the parameterized notebook locally.

Parameters
  • nb (NotebookNode) – Executable notebook object.

  • kernel_name (str) – Name of kernel to execute the notebook against.

  • log_output (bool) – Flag for whether or not to write notebook output to the configured logger.

  • start_timeout (int) – Duration to wait for kernel start-up.

  • execution_timeout (int) – Duration to wait before failing execution (default: never).

class papermill.engines.NotebookExecutionManager(nb, output_path=None, log_output=False, progress_bar=True, autosave_cell_every=30)

Bases: object

Wrapper for execution state of a notebook.

This class is a wrapper for notebook objects to house execution state related to the notebook being run through an engine.

In particular the NotebookExecutionManager provides common update callbacks for use within engines to facilitate metadata and persistence actions in a shared manner.

COMPLETED = 'completed'
FAILED = 'failed'
PENDING = 'pending'
RUNNING = 'running'
autosave_cell()

Saves the notebook if it’s been more than self.autosave_cell_every seconds since it was last saved.

cell_complete(cell, cell_index=None, **kwargs)

Finalize metadata for a cell and save notebook.

Optionally called by engines during execution to finalize the metadata for a cell and save the notebook to the output path.

cell_exception(cell, cell_index=None, **kwargs)

Set metadata when an exception is raised.

Called by engines when an exception is raised within a notebook to set the metadata on the notebook indicating the location of the failure.

cell_start(cell, cell_index=None, **kwargs)

Set and save a cell’s start state.

Optionally called by engines during execution to initialize the metadata for a cell and save the notebook to the output path.

cleanup_pbar()

Clean up a progress bar

complete_pbar()

Refresh progress bar

notebook_complete(**kwargs)

Finalize the metadata for a notebook and save the notebook to the output path.

Called by Engine when execution concludes, regardless of exceptions.

notebook_start(**kwargs)

Initialize a notebook, clearing its metadata, and save it.

When starting a notebook, this initializes and clears the metadata for the notebook and its cells, and saves the notebook to the given output path.

Called by Engine when execution begins.

now()

Helper to return current UTC time

save(**kwargs)

Saves the wrapped notebook state.

If an output path is known, this triggers a save of the wrapped notebook state to the provided path.

Can be used outside of cell state changes if execution is taking a long time to conclude but the notebook object should be synced.

For example, you may want to save the notebook every 10 minutes when running a 5 hour cell execution to capture output messages in the notebook.

set_timer()

Initializes the execution timer for the notebook.

This is called automatically when a NotebookExecutionManager is constructed.

class papermill.engines.PapermillEngines

Bases: object

The holder which houses any engine registered with the system.

This object is used in a singleton manner to save and load particular named Engine objects so they may be referenced externally.

execute_notebook_with_engine(engine_name, nb, kernel_name, **kwargs)

Fetch a named engine and execute the nb object against it.

get_engine(name=None)

Retrieves an engine by name.

register(name, engine)

Register a named engine

register_entry_points()

Register entrypoints for an engine

Load handlers provided by other packages

papermill.engines.catch_nb_assignment(func)

Wrapper to catch nb keyword arguments

This helps catch nb keyword arguments and assign onto self when passed to the wrapped function.

Used for callback methods when the caller may optionally have a new copy of the originally wrapped nb object.

papermill.execute

papermill.execute.execute_notebook(input_path, output_path, parameters=None, engine_name=None, request_save_on_cell_execute=True, prepare_only=False, kernel_name=None, language=None, progress_bar=True, log_output=False, stdout_file=None, stderr_file=None, start_timeout=60, report_mode=False, cwd=None, **engine_kwargs)

Executes a single notebook locally.

Parameters
  • input_path (str or Path) – Path to input notebook

  • output_path (str or Path) – Path to save executed notebook

  • parameters (dict, optional) – Arbitrary keyword arguments to pass to the notebook parameters

  • engine_name (str, optional) – Name of execution engine to use

  • request_save_on_cell_execute (bool, optional) – Request save notebook after each cell execution

  • autosave_cell_every (int, optional) – How often in seconds to save in the middle of long cell executions

  • prepare_only (bool, optional) – Flag to determine if execution should occur or not

  • kernel_name (str, optional) – Name of kernel to execute the notebook against

  • language (str, optional) – Programming language of the notebook

  • progress_bar (bool, optional) – Flag for whether or not to show the progress bar.

  • log_output (bool, optional) – Flag for whether or not to write notebook output to the configured logger

  • start_timeout (int, optional) – Duration in seconds to wait for kernel start-up

  • report_mode (bool, optional) – Flag for whether or not to hide input.

  • cwd (str or Path, optional) – Working directory to use when executing the notebook

  • **kwargs – Arbitrary keyword arguments to pass to the notebook engine

Returns

nb – Executed notebook object

Return type

NotebookNode

papermill.execute.prepare_notebook_metadata(nb, input_path, output_path, report_mode=False)

Prepare metadata associated with a notebook and its cells

Parameters
  • nb (NotebookNode) – Executable notebook object

  • input_path (str) – Path to input notebook

  • output_path (str) – Path to write executed notebook

  • report_mode (bool, optional) – Flag to set report mode

papermill.execute.raise_for_execution_errors(nb, output_path)

Assigned parameters into the appropriate place in the input notebook

Parameters
  • nb (NotebookNode) – Executable notebook object

  • output_path (str) – Path to write executed notebook

papermill.execute.remove_error_markers(nb)

papermill.clientwrap

class papermill.clientwrap.PapermillNotebookClient(**kwargs)

Bases: nbclient.client.NotebookClient

Module containing a that executes the code cells and updates outputs

execute(**kwargs)

Wraps the parent class process call slightly

log_output

A boolean (True, False) trait.

log_output_message(output)

Process a given output. May log it in the configured logger and/or write it into the configured stdout/stderr files.

Parameters

output – nbformat.notebooknode.NotebookNode

Returns

papermill_execute_cells()

This function replaces cell execution with it’s own wrapper.

We are doing this for the following reasons:

  1. Notebooks will stop executing when they encounter a failure but not raise a CellException. This allows us to save the notebook with the traceback even though a CellExecutionError was encountered.

  2. We want to write the notebook as cells are executed. We inject our logic for that here.

  3. We want to include timing and execution status information with the metadata of each cell.

process_message(*arg, **kwargs)

Processes a kernel message, updates cell state, and returns the resulting output object that was appended to cell.outputs.

The input argument cell is modified in-place.

Parameters
  • msg (dict) – The kernel message being processed.

  • cell (nbformat.NotebookNode) – The cell which is currently being processed.

  • cell_index (int) – The position of the cell within the notebook object.

Returns

output – The execution output payload (or None for no output).

Return type

dict

Raises

CellExecutionComplete – Once a message arrives which indicates computation completeness.

stderr_file

A trait whose value must be an instance of a specified class.

The value can also be an instance of a subclass of the specified class.

Subclasses can declare default classes by overriding the klass attribute

stdout_file

A trait whose value must be an instance of a specified class.

The value can also be an instance of a subclass of the specified class.

Subclasses can declare default classes by overriding the klass attribute