Classes¶

class sqltask.base.engine.EngineContext(name, url, engine_params=None, metadata_params=None)[source]¶

create_new(database, schema)[source]¶

Create a new EngineContext based on the current instance, but with a different schema.

Parameters

database (Optional[str]) – Database to use. If left unspecified, falls back to the database provided by the original engine context
schema (Optional[str]) – Schema to use. If left unspecified, falls back to the schema provided by the original engine context

Return type

EngineContext

Returns

a new instance of EngineContext with different url

class sqltask.base.table.BaseTableContext(name, engine_context, columns, comment=None, database=None, schema=None, batch_params=None, timestamp_column_name=None, table_params=None)[source]¶

The BaseTableContext class contains everything necessary for creating/modifying a target table/schema and inserting/removing rows.

delete_rows()[source]¶

Delete old rows from target table that match batch parameters.

Return type: None

get_new_row()[source]¶

Get a new row intended to be added to the table.

Return type: BaseOutputRow

insert_rows()[source]¶

Insert rows into target tables.

Return type: None

map_all(row_source, mappings=None, funcs=None)[source]¶

Convenience method for mapping all rows and columns from the input row source to the output table in a one-to-one fashion. The optional arguments mappings and funcs can be used to specify alternative column name mappings and conversion functions.

Parameters

row_source (BaseRowSource) – Input row source to map to the outout table.
mappings (Optional[Dict[str, str]]) – mapping from target column name to source column name. If the source and target names differ for one or several columns, these can be specified here. Example: {“customer_name”: “cust_n”} would map the values in the source column “cust_n” to the target column “customer_name”.
funcs (Optional[Dict[str, Callable[[Any], Any]]]) – mapping from target column name to callable function. If the source and target types differ for one or several columns, a callable can be specified here. Typically this is needed when ingesting data from a CSV file where the source data types are always strings, but might need to be cast to int, float or Decimal. Example: {“customer_age”: int} would call int() on the source value.

Return type

None

migrate_schema()[source]¶

Migrate table schema to correspond to table definition.

Return type: None

class sqltask.base.table.BaseOutputRow(table_context)[source]¶

A class for storing cell values for a single row in a TableContext table. When the object is created, all batch parameters are prepopulated.

append()[source]¶

Append the row to the target table. append() should only be called once all cell values for the row have been fully populated, as any changes.

Return type: None

map_all(input_row, mappings=None, funcs=None, columns=None, auto_append=False)[source]¶

Convenience method for mapping column values one-to-one from an input row to the output row. Will only map any unmapped columns, i.e. if the target row has columns “customer_id” and “customer_name”, and “customer_name” has already been populated, only “customer_id” will be mapped.

Parameters

input_row (Mapping[str, Any]) – the input row to map values from.
mappings (Optional[Dict[str, str]]) – mapping from target column name to source column name. If the source and target names differ for one or several columns, these can be specified here. Example: {“customer_name”: “cust_n”} would map the values in the source column “cust_n” to the target column “customer_name”.
funcs (Optional[Dict[str, Callable[[Any], Any]]]) – mapping from target column name to callable function. If the source and target types differ for one or several columns, a callable can be specified here. Typically this is needed when ingesting data from a CSV file where the source data types are always strings, but might need to be cast to int, float or Decimal. Example: {“customer_age”: int} would call int() on the source value.
columns (Optional[Sequence[str]]) – A list of column names to map. If undefined, tries to map all unmapped columns in target row.
auto_append (bool) – Call append after mapping rows if the mapping operation is successful.

Return type

None

class sqltask.base.table.DqTableContext(name, engine_context, columns, comment=None, schema=None, batch_params=None, timestamp_column_name=None, table_params=None, dq_table_name=None, dq_engine_context=None, dq_schema=None, dq_info_column_names=None, dq_table_params=None)[source]¶

A TableContext child class with support for logging data quality issues to a separate data quality table. A with the ability to log data quality issues

delete_rows()[source]¶

Delete old rows from target table that match batch parameters.

Return type: None

get_new_row()[source]¶

Get a new row intended to be added to the table.

Return type: DqOutputRow

insert_rows()[source]¶

Insert rows into target tables.

Return type: None

migrate_schema()[source]¶

Migrate table schema to correspond to table definition.

Return type: None

class sqltask.base.table.DqOutputRow(table_context)[source]¶

log_dq(column_name, category, priority, source, message=None)[source]¶

Log data quality issue to be recorded in data quality table. If logging has been disabled by calling set_logging_enabled(False), data quality issues will be ignored.

Parameters

column_name (Optional[str]) – Name of affected column in target table.
category (Category) – The type of data quality issue.
source (Source) – To what phase the data quality issue relates.
priority (Priority) – What the priority of the data quality issue is. Should be None for aggregate data quality issues.
message (Optional[str]) – Verbose description of observed issue.

Return type

None

set_logging_enabled(enabled)[source]¶

If logging is set to false, data quality issues will not be passed to the log table. This is useful for rows with lower priority data, e.g. inactive users, whose data quality may be of poorer quality due to being stale.

Parameters: enabled (bool) – set to True to log issues; False to ignore calls to log_dq`
Return type: None

class sqltask.base.lookup_source.BaseLookupSource(name, row_source, keys)[source]¶

get(*unnamed_keys, **named_keys)[source]¶

Get a value from the lookup. Assuming the key for a Lookup is key1, key2, key3, the following are valid calls:

>>> # only unnamed keys
>>> lookup.get("val1", "val2", "val3")
>>> # only named keys in non-original order
>>> lookup.get(key3="val3", key1="val1", key2="val2")
>>> # both named and unnamed keys
>>> lookup.get("val1", key3="val3", key2="val2")

If a row is not found in the lookup table, the method returns an empty dict.

Parameters

unnamed_keys – unnamed key values to be used as keys
named_keys – named key values to be used as keys

Return type

Dict[str, Any]

Returns

A dict with keys as the column name and values as the cell values. If key undefined in internal dict return an empty dict.

class sqltask.base.row_source.BaseRowSource(name=None)[source]¶: Base class for data sources that return iterable rows. A row from a BaseRowSource can be any Mapping from a key (=column name) to a value (=cell value) that can be referenced as follows: >>> for row in rows: >>> column_value = row[“column_name”]