Storage

Azure

These modules outline how to interact with Azure data stores, specifically Azure Blob Storage and Azure Data Lakes.

papermill.abs module

papermill.adl module

AWS

This module shows how to interact with AWS S3 data stores.

papermill.s3 module

Utilities for working with S3.

class papermill.s3.Bucket(name, service=None)

Bases: object

Represents a Bucket of storage on S3

Parameters
  • name (string) – name of the bucket

  • service (string, optional (Default is None)) – name of a service resource, such as SQS, EC2, etc.

list(prefix='', delimiter=None)

Limits a list of Bucket’s objects based on prefix and delimiter.

class papermill.s3.Key(bucket, name, size=None, etag=None, last_modified=None, storage_class=None, service=None)

Bases: object

A key that represents a unique object in an S3 Bucket.

Represents a file or stream.

Parameters
  • bucket (object) – A bucket of S3 storage

  • name (string) – representative name of the bucket

  • size (???, optional (Default is None)) –

  • etag (???, optional (Default is None)) –

  • last_modified (date, optional (Default is None)) –

  • storage_class (???, optional (Default is None)) –

  • service (string, optional (Default is None)) – name of a service resource, such as SQS, EC2, etc.

class papermill.s3.Prefix(bucket, name, service=None)

Bases: object

Represents a prefix used in an S3 Bucket.

Parameters
  • bucket (object) – A bucket of S3 storage

  • name (string) – name of the bucket

  • service (string, optional (Default is None)) – name of a service resource, such as SQS, EC2, etc.

class papermill.s3.S3(keyname=None, *args, **kwargs)

Bases: object

Wraps S3.

Parameters

keyname (TODO) –

The following are wrapped utilities for S3:
  • cat

  • cp_string

  • list

  • list_dir

  • read

cat(source, buffersize=None, memsize=16777216, compressed=False, encoding='UTF-8', raw=False)

Returns an iterator for the data in the key or nothing if the key doesn’t exist. Decompresses data on the fly (if compressed is True or key ends with .gz) unless raw is True. Pass None for encoding to skip encoding.

cp_string(source, dest, **kwargs)

Copies source string into the destination location.

Parameters
  • source (string) – the string with the content to copy

  • dest (string) – the s3 location

list(name, iterator=False, **kwargs)

Returns a list of the files under the specified path name must be in the form of s3://bucket/prefix

Parameters
  • keys (optional) – if True then this will return the actual boto keys for files that are encountered

  • objects (optional) – if True then this will return the actual boto objects for files or prefixes that are encountered

  • delimiter (optional) – if set this

  • iterator (optional) – if True return iterator rather than converting to list object

listdir(name, **kwargs)

Returns a list of the files under the specified path.

This is different from list as it will only give you files under the current directory, much like ls.

name must be in the form of s3://bucket/prefix/

Parameters
  • keys (optional) – if True then this will return the actual boto keys for files that are encountered

  • objects (optional) – if True then this will return the actual boto objects for files or prefixes that are encountered

lock = <unlocked _thread.RLock object owner=0 count=0>
read(source, compressed=False, encoding='UTF-8')

Iterates over a file in s3 split on newline.

Yields a line in file.

s3_session = (None, None, None)