DefaultPTRImplementation

class VmaxBuilder.protein.ptr_implementation.DefaultPTRImplementation[source]

Generated: validation needed.

Description:

PTR preprocessing and combination logic for the expression+PTR protein abundance pathway. Covers standardisation, deduplication, log→linear conversion, within-sample imputation, and expansion to the full expression gene index.

Public Methods

combine_expression_with_ptr(…)

Generated: validation needed.

get_latest_preparation_diagnostics(…)

Generated: validation needed.

get_weights(…)

Generated: validation needed.

impute_unobserved_genes(…)

Generated: validation needed.

impute_within_tissue_ptrs(…)

Generated: validation needed.

prepare_ptr_frame(…)

Generated: validation needed.

remove_ptr_duplicates(…)

Generated: validation needed.

requires_sample_type_map(…)

Generated: validation needed.

resolve_ptr_frame(…)

Generated: validation needed.

resolve_sample_type_map(…)

Generated: validation needed.

resolve_special_gene_groups(…)

Generated: validation needed.

standardize_ptr_frame(…)

Generated: validation needed.

transform_ptr_to_linear(…)

Generated: validation needed.

resolve_ptr_frame(scaffold: Scaffold, config: APIConfig) DataFrame | None[source]

Generated: validation needed.

Description:

Resolve PTR dataframe from configured scaffold/config sources.

Parameters:
  • scaffold (Scaffold) – Shared pipeline scaffold.

  • config (APIConfig) – Root API configuration.

Returns:

pd.DataFrame | None – PTR dataframe when available.

static standardize_ptr_frame(ptr_df: DataFrame) DataFrame[source]

Generated: validation needed.

Description:

Standardize missing-value tokens, trim string whitespace, and coerce all values to numeric. Resets integer-indexed frames to use the first column as the index. Normalises column names to lower-case.

Parameters:

ptr_df (pd.DataFrame) – Raw PTR table (genes × samples).

Returns:

pd.DataFrame – Numeric PTR frame with standardized missing values and lower-case column names.

static remove_ptr_duplicates(ptr_df: DataFrame) DataFrame[source]

Generated: validation needed.

Description:

For each duplicated gene row, retain the row with the most non-missing values. When tied, keep the first occurrence.

Parameters:

ptr_df (pd.DataFrame) – PTR table potentially containing duplicate gene identifiers in the index.

Returns:

pd.DataFrame – PTR table with unique gene index.

static transform_ptr_to_linear(ptr_df: DataFrame, pretransformed_type: str = 'linear') DataFrame[source]

Generated: validation needed.

Description:

Convert PTR frame to linear space from configured transform state. Supports none alias for linear.

Parameters:
  • ptr_df (pd.DataFrame) – PTR table in source transform space.

  • pretransformed_type (str) – Source transform key. One of linear, log10, log2, ln.

Returns:

pd.DataFrame – PTR table transformed to linear space.

Raises:

ValueError – When pretransformed_type is unsupported.

static get_weights(df: DataFrame, col_stat_function: Callable[[Series], float]) Series[source]

Generated: validation needed.

Description:

Compute per-column weighting ratios for within-sample imputation.

Parameters:
  • df (pd.DataFrame) – PTR frame in linear space.

  • col_stat_function (Callable[[pd.Series], float]) – Statistic function for column aggregation and global normalisation.

Returns:

pd.Series – Weight ratio per PTR column.

static _validate_within_sample_weighting(use_weighted: bool, weighted_statistic: str | None) None[source]

Generated: validation needed.

Description:

Validate effective within-sample weighting inputs.

Parameters:
  • use_weighted (bool) – Weighted-imputation toggle.

  • weighted_statistic (str | None) – Column-statistic key for weighted mode.

Raises:

ValueError – When weighted mode lacks a strategy statistic.

static _resolve_within_sample_stat_functions(use_weighted: bool, weighted_statistic: str | None, imputation_statistic: str) tuple[Callable[[Series], float], Callable[[Series], float] | None][source]

Generated: validation needed.

Description:

Resolve callable statistic functions used by within-sample imputation.

Parameters:
  • use_weighted (bool) – Weighted-imputation toggle.

  • weighted_statistic (str | None) – Weighted-column statistic key.

  • imputation_statistic (str) – Row-wise statistic key.

Returns:

tuple[Callable[[pd.Series], float], Callable[[pd.Series], float] | None] – Row-statistic function and optional weighted-column statistic function.

Raises:

ValueError – When requested statistic keys are unsupported.

static impute_within_tissue_ptrs(ptr_df: DataFrame, use_weighted: bool = True, weighted_statistic: str | None = 'median', imputation_statistic: str = 'median') DataFrame[source]

Generated: validation needed.

Description:

Impute missing values for genes observed in at least one sample. Weighted behaviour is controlled by use_weighted.

Parameters:
  • ptr_df (pd.DataFrame) – PTR table in linear space (genes × samples).

  • use_weighted (bool) – Apply weighted per-column scaling during within-sample imputation.

  • weighted_statistic (str | None) – Statistic for weighted column ratio.

  • imputation_statistic (str) – Statistic used for row-wise base fill.

Returns:

pd.DataFrame – PTR table with within-sample missing values filled.

Raises:

ValueError – When weighting configuration or statistic is unrecognised.

static _resolve_unobserved_source_frame(ptr_df: DataFrame, strategy: str, reference_df: DataFrame | None) DataFrame[source]

Generated: validation needed.

Description:

Resolve source PTR frame used to compute unobserved-gene fill statistics.

Parameters:
  • ptr_df (pd.DataFrame) – PTR frame after within-sample imputation.

  • strategy (str) – Unobserved-gene strategy.

  • reference_df (pd.DataFrame | None) – Optional pre-imputation frame.

Returns:

pd.DataFrame – Source frame for per-sample statistics.

Raises:

ValueError – When before-imputation strategy is selected without a reference frame.

static _compute_per_sample_fill_values(source_df: DataFrame, statistic: str) dict[str, float][source]

Generated: validation needed.

Description:

Compute one fill value per sample column from chosen source frame.

Parameters:
  • source_df (pd.DataFrame) – Source frame used for statistics.

  • statistic (str) – Aggregation statistic key.

Returns:

dict[str, float] – Per-sample fill values.

Raises:

ValueError – When statistic key is unsupported.

static _apply_global_unobserved_fill(df: DataFrame, fill_values: dict[str, float], target_gene_ids: set[str]) DataFrame[source]

Generated: validation needed.

Description:

Fill missing cells using one global per-sample statistic.

Parameters:
  • df (pd.DataFrame) – Target PTR frame aligned to expression index.

  • fill_values (dict[str, float]) – Per-sample fallback values.

  • target_gene_ids (set[str]) – Gene IDs eligible for unobserved-gene fill.

Returns:

pd.DataFrame – Frame with missing values filled.

static _apply_grouped_unobserved_fill(df: DataFrame, source_df: DataFrame, statistic: str, special_gene_groups: dict[str, list[str]], fallback_fill_values: dict[str, float], target_gene_ids: set[str], trace: dict[str, Any] | None = None) DataFrame[source]

Generated: validation needed.

Description:

Fill missing cells by special-gene groups with independent per-group statistics and global fallback.

Parameters:
  • df (pd.DataFrame) – Target PTR frame aligned to expression index.

  • source_df (pd.DataFrame) – Source frame for statistic calculation.

  • statistic (str) – Aggregation statistic key.

  • special_gene_groups (dict[str, list[str]]) – Group name to gene IDs.

  • fallback_fill_values (dict[str, float]) – Global per-sample fallback values.

  • target_gene_ids (set[str]) – Gene IDs eligible for unobserved-gene fill.

  • trace (dict[str, Any] | None) – Optional mutable trace dictionary populated with special-group mapping and assigned imputed values.

Returns:

pd.DataFrame – Frame with grouped missing-value imputation applied.

static _resolve_grouped_fill_value(*, column_name: Any, group_name: str | None, group_fill_values: dict[str, dict[str, float]], fallback_fill_values: dict[str, float]) float[source]

Generated: validation needed.

Description:

Resolve one grouped unobserved-gene fill value with fallback.

Parameters:
  • column_name (Any) – Sample/column identifier.

  • group_name (str | None) – Optional resolved group name.

  • group_fill_values (dict[str, dict[str, float]]) – Per-group fill values.

  • fallback_fill_values (dict[str, float]) – Global fallback fill values.

Returns:

float – Assigned fill value.

static impute_unobserved_genes(ptr_df: DataFrame, expression_df: DataFrame, unobserved_gene_ids: set[str], strategy: str = 'sample_after_imputation', statistic: str = 'median', reference_df: DataFrame | None = None, special_gene_groups: dict[str, list[str]] | None = None, use_special_groups: bool = False, trace: dict[str, Any] | None = None) DataFrame[source]

Generated: validation needed.

Description:

Expand PTR to match the full gene index of expression_df. Genes present in expression but absent from PTR are filled using a per-sample statistic. sample_after_imputation computes the statistic on the incoming (already-imputed) PTR values; sample_before_imputation behaves identically at call time since the pre-imputation snapshot must be supplied externally by the caller if needed.

Parameters:
  • ptr_df (pd.DataFrame) – PTR table after within-sample imputation.

  • expression_df (pd.DataFrame) – Expression table whose index defines the target gene universe.

  • unobserved_gene_ids (set[str]) – Gene IDs present in expression but absent from PTR to be filled.

  • strategy (str) – Imputation strategy for unobserved genes. One of sample_after_imputation, sample_before_imputation.

  • statistic (str) – Per-sample aggregation statistic. One of median, mean, mode, max, min.

  • reference_df (pd.DataFrame | None) – Pre-within-imputation PTR frame used when strategy='sample_before_imputation'.

  • special_gene_groups (dict[str, list[str]] | None) – Optional special groups to impute independently.

  • use_special_groups (bool) – Enable special-group independent imputation behavior.

  • trace (dict[str, Any] | None) – Optional mutable trace dictionary populated with grouped-imputation diagnostics.

Returns:

pd.DataFrame – PTR table re-indexed to expression_df.index with unobserved genes filled.

Raises:

ValueError – When strategy/statistic is unrecognised or when sample_before_imputation lacks a reference frame.

prepare_ptr_frame(ptr_df: DataFrame, expression_df: DataFrame, config: APIConfig, metabolic_genes: list[str] | None = None, model_artifact: Any | None = None) DataFrame[source]

Generated: validation needed.

Description:

Full PTR preprocessing pipeline: standardize → deduplicate → optionally filter to metabolic genes → convert to linear scale → impute within-sample missing values → expand to expression gene index.

Parameters:
  • ptr_df (pd.DataFrame) – Raw PTR input table (genes × tissue-types).

  • expression_df (pd.DataFrame) – Preprocessed expression table used to define the target gene universe and guide imputation.

  • config (APIConfig) – Root API configuration. PTR options read from config.ptr.

  • metabolic_genes (list[str] | None) – Optional list of gene IDs from the metabolic model. When provided and config.ptr.impute_from_metabolic_genes_only is True, PTR is filtered to this set before imputation.

  • model_artifact (Any | None) – Optional cobra-like model used to expand shorthand special gene groups such as transport_reactions.

Returns:

pd.DataFrame – Fully preprocessed PTR table aligned to the expression gene index.

get_latest_preparation_diagnostics() dict[str, Any][source]

Generated: validation needed.

Description:

Return diagnostics captured during latest PTR preparation call.

Returns:

dict[str, Any] – PTR preparation diagnostics for inter-stage artifact persistence.

static resolve_special_gene_groups(config: APIConfig, model_artifact: Any | None = None, expression_gene_ids: set[str] | None = None) dict[str, list[str]][source]

Generated: validation needed.

Description:

Resolve user-provided special gene groups used by PTR unobserved-gene imputation. This endpoint enables independent group-wise imputation (e.g., transport genes or other custom partitions). Group values may contain gene IDs or reaction IDs; transport_reactions with an empty list auto-resolves transport-associated genes from model.

Parameters:
  • config (APIConfig) – Root API configuration.

  • model_artifact (Any | None) – Optional cobra-like model used for shorthand and reaction-based group expansion.

  • expression_gene_ids (set[str] | None) – Optional expression-gene universe used to filter resolved group members.

Returns:

dict[str, list[str]] – Mapping of group name to normalized gene IDs.

Raises:

ValueError – When transport_reactions shorthand is requested without a model artifact.

classmethod requires_sample_type_map(ptr_method: str) bool[source]

Generated: validation needed.

Description:

Report whether a PTR implementation strategy requires an explicit expression→PTR sample/tissue mapping.

Parameters:

ptr_method (str) – PTR strategy key from config.protein.ptr_method.

Returns:

boolTrue when selected PTR method requires expression.sample_type_map.

static resolve_sample_type_map(expression_df: DataFrame, sample_type_map: dict[str, str] | str | None) dict[str, str][source]

Generated: validation needed.

Description:

Build a {expression_column: ptr_column} mapping from the user-supplied sample_type_map.

  • None → identity map (each expression column maps to itself).

  • str → every expression column maps to that single PTR column.

  • dict → used directly; expression columns absent from the dict fall back to an identity mapping.

Labels are normalized for robust matching: lower-case, stripped whitespace, and _ptr suffix removed.

Parameters:
  • expression_df (pd.DataFrame) – Expression table whose columns define the source keys.

  • sample_type_map (dict[str, str] | str | None) – User-configured column mapping.

Returns:

dict[str, str] – Mapping of expression column → PTR column.

combine_expression_with_ptr(expression_df: DataFrame, ptr_df: DataFrame, sample_type_map: dict[str, str] | str | None = None) DataFrame[source]

Generated: validation needed.

Description:

Multiply expression values by PTR values for each gene, using the resolved sample-type column mapping to pair expression columns with PTR columns. Genes absent from PTR retain their expression values.

Parameters:
  • expression_df (pd.DataFrame) – Preprocessed expression table (genes × expression-samples).

  • ptr_df (pd.DataFrame) – Preprocessed PTR table (genes × tissue-types).

  • sample_type_map (dict[str, str] | str | None) – Mapping from expression column names to PTR column names. str maps every expression column to the same PTR column; None falls back to direct column intersection.

Returns:

pd.DataFrame – Combined protein abundance table with same shape as expression_df.