DefaultPTRImplementation

class VmaxBuilder.protein.ptr_implementation.DefaultPTRImplementation[source]

Generated: validation needed.

Description:: PTR preprocessing and combination logic for the expression+PTR protein abundance pathway. Covers standardisation, deduplication, log→linear conversion, within-sample imputation, and expansion to the full expression gene index.

Public Methods

`combine_expression_with_ptr`(…)	Generated: validation needed.
`get_latest_preparation_diagnostics`(…)	Generated: validation needed.
`get_weights`(…)	Generated: validation needed.
`impute_unobserved_genes`(…)	Generated: validation needed.
`impute_within_tissue_ptrs`(…)	Generated: validation needed.
`prepare_ptr_frame`(…)	Generated: validation needed.
`remove_ptr_duplicates`(…)	Generated: validation needed.
`requires_sample_type_map`(…)	Generated: validation needed.
`resolve_ptr_frame`(…)	Generated: validation needed.
`resolve_sample_type_map`(…)	Generated: validation needed.
`resolve_special_gene_groups`(…)	Generated: validation needed.
`standardize_ptr_frame`(…)	Generated: validation needed.
`transform_ptr_to_linear`(…)	Generated: validation needed.

resolve_ptr_frame(scaffold: Scaffold, config: APIConfig) → DataFrame | None[source]

Generated: validation needed.

Description:: Resolve PTR dataframe from configured scaffold/config sources.

Parameters:

scaffold (Scaffold) – Shared pipeline scaffold.
config (APIConfig) – Root API configuration.

Returns:

pd.DataFrame | None – PTR dataframe when available.

static standardize_ptr_frame(ptr_df: DataFrame) → DataFrame[source]

Generated: validation needed.

Description:: Standardize missing-value tokens, trim string whitespace, and coerce all values to numeric. Resets integer-indexed frames to use the first column as the index. Normalises column names to lower-case.

Parameters:: ptr_df (pd.DataFrame) – Raw PTR table (genes × samples).
Returns:: pd.DataFrame – Numeric PTR frame with standardized missing values and lower-case column names.

static remove_ptr_duplicates(ptr_df: DataFrame) → DataFrame[source]

Generated: validation needed.

Description:: For each duplicated gene row, retain the row with the most non-missing values. When tied, keep the first occurrence.

Parameters:: ptr_df (pd.DataFrame) – PTR table potentially containing duplicate gene identifiers in the index.
Returns:: pd.DataFrame – PTR table with unique gene index.

static transform_ptr_to_linear(ptr_df: DataFrame, pretransformed_type: str = 'linear') → DataFrame[source]

Generated: validation needed.

Description:: Convert PTR frame to linear space from configured transform state. Supports none alias for linear.

Parameters:

ptr_df (pd.DataFrame) – PTR table in source transform space.
pretransformed_type (str) – Source transform key. One of linear, log10, log2, ln.

Returns:

pd.DataFrame – PTR table transformed to linear space.

Raises:

ValueError – When pretransformed_type is unsupported.

static get_weights(df: DataFrame, col_stat_function: Callable[[Series], float]) → Series[source]

Generated: validation needed.

Description:: Compute per-column weighting ratios for within-sample imputation.

Parameters:

df (pd.DataFrame) – PTR frame in linear space.
col_stat_function (Callable[[pd.Series], float]) – Statistic function for column aggregation and global normalisation.

Returns:

pd.Series – Weight ratio per PTR column.

static _validate_within_sample_weighting(use_weighted: bool, weighted_statistic: str | None) → None[source]

Generated: validation needed.

Description:: Validate effective within-sample weighting inputs.

Parameters:

use_weighted (bool) – Weighted-imputation toggle.
weighted_statistic (str | None) – Column-statistic key for weighted mode.

Raises:

ValueError – When weighted mode lacks a strategy statistic.

static _resolve_within_sample_stat_functions(use_weighted: bool, weighted_statistic: str | None, imputation_statistic: str) → tuple[Callable[[Series], float], Callable[[Series], float] | None][source]

Generated: validation needed.

Description:: Resolve callable statistic functions used by within-sample imputation.

Parameters:

use_weighted (bool) – Weighted-imputation toggle.
weighted_statistic (str | None) – Weighted-column statistic key.
imputation_statistic (str) – Row-wise statistic key.

Returns:

tuple[Callable[[pd.Series], float], Callable[[pd.Series], float] | None] – Row-statistic function and optional weighted-column statistic function.

Raises:

ValueError – When requested statistic keys are unsupported.

static impute_within_tissue_ptrs(ptr_df: DataFrame, use_weighted: bool = True, weighted_statistic: str | None = 'median', imputation_statistic: str = 'median') → DataFrame[source]

Generated: validation needed.

Description:: Impute missing values for genes observed in at least one sample. Weighted behaviour is controlled by use_weighted.

Parameters:

ptr_df (pd.DataFrame) – PTR table in linear space (genes × samples).
use_weighted (bool) – Apply weighted per-column scaling during within-sample imputation.
weighted_statistic (str | None) – Statistic for weighted column ratio.
imputation_statistic (str) – Statistic used for row-wise base fill.

Returns:

pd.DataFrame – PTR table with within-sample missing values filled.

Raises:

ValueError – When weighting configuration or statistic is unrecognised.

static _resolve_unobserved_source_frame(ptr_df: DataFrame, strategy: str, reference_df: DataFrame | None) → DataFrame[source]

Generated: validation needed.

Description:: Resolve source PTR frame used to compute unobserved-gene fill statistics.

Parameters:

ptr_df (pd.DataFrame) – PTR frame after within-sample imputation.
strategy (str) – Unobserved-gene strategy.
reference_df (pd.DataFrame | None) – Optional pre-imputation frame.

Returns:

pd.DataFrame – Source frame for per-sample statistics.

Raises:

ValueError – When before-imputation strategy is selected without a reference frame.

static _compute_per_sample_fill_values(source_df: DataFrame, statistic: str) → dict[str, float][source]

Generated: validation needed.

Description:: Compute one fill value per sample column from chosen source frame.

Parameters:

source_df (pd.DataFrame) – Source frame used for statistics.
statistic (str) – Aggregation statistic key.

Returns:

dict[str, float] – Per-sample fill values.

Raises:

ValueError – When statistic key is unsupported.

static _apply_global_unobserved_fill(df: DataFrame, fill_values: dict[str, float], target_gene_ids: set[str]) → DataFrame[source]

Generated: validation needed.

Description:: Fill missing cells using one global per-sample statistic.

Parameters:

df (pd.DataFrame) – Target PTR frame aligned to expression index.
fill_values (dict[str, float]) – Per-sample fallback values.
target_gene_ids (set[str]) – Gene IDs eligible for unobserved-gene fill.

Returns:

pd.DataFrame – Frame with missing values filled.

static _apply_grouped_unobserved_fill(df: DataFrame, source_df: DataFrame, statistic: str, special_gene_groups: dict[str, list[str]], fallback_fill_values: dict[str, float], target_gene_ids: set[str], trace: dict[str, Any] | None = None) → DataFrame[source]

Generated: validation needed.

Description:: Fill missing cells by special-gene groups with independent per-group statistics and global fallback.

Parameters:

df (pd.DataFrame) – Target PTR frame aligned to expression index.
source_df (pd.DataFrame) – Source frame for statistic calculation.
statistic (str) – Aggregation statistic key.
special_gene_groups (dict[str, list[str]]) – Group name to gene IDs.
fallback_fill_values (dict[str, float]) – Global per-sample fallback values.
target_gene_ids (set[str]) – Gene IDs eligible for unobserved-gene fill.
trace (dict[str, Any] | None) – Optional mutable trace dictionary populated with special-group mapping and assigned imputed values.

Returns:

pd.DataFrame – Frame with grouped missing-value imputation applied.

static _resolve_grouped_fill_value(*, column_name: Any, group_name: str | None, group_fill_values: dict[str, dict[str, float]], fallback_fill_values: dict[str, float]) → float[source]

Generated: validation needed.

Description:: Resolve one grouped unobserved-gene fill value with fallback.

Parameters:

column_name (Any) – Sample/column identifier.
group_name (str | None) – Optional resolved group name.
group_fill_values (dict[str, dict[str, float]]) – Per-group fill values.
fallback_fill_values (dict[str, float]) – Global fallback fill values.

Returns:

float – Assigned fill value.

static impute_unobserved_genes(ptr_df: DataFrame, expression_df: DataFrame, unobserved_gene_ids: set[str], strategy: str = 'sample_after_imputation', statistic: str = 'median', reference_df: DataFrame | None = None, special_gene_groups: dict[str, list[str]] | None = None, use_special_groups: bool = False, trace: dict[str, Any] | None = None) → DataFrame[source]

Generated: validation needed.

Description:: Expand PTR to match the full gene index of expression_df. Genes present in expression but absent from PTR are filled using a per-sample statistic. sample_after_imputation computes the statistic on the incoming (already-imputed) PTR values; sample_before_imputation behaves identically at call time since the pre-imputation snapshot must be supplied externally by the caller if needed.

Parameters:

ptr_df (pd.DataFrame) – PTR table after within-sample imputation.
expression_df (pd.DataFrame) – Expression table whose index defines the target gene universe.
unobserved_gene_ids (set[str]) – Gene IDs present in expression but absent from PTR to be filled.
strategy (str) – Imputation strategy for unobserved genes. One of sample_after_imputation, sample_before_imputation.
statistic (str) – Per-sample aggregation statistic. One of median, mean, mode, max, min.
reference_df (pd.DataFrame | None) – Pre-within-imputation PTR frame used when strategy='sample_before_imputation'.
special_gene_groups (dict[str, list[str]] | None) – Optional special groups to impute independently.
use_special_groups (bool) – Enable special-group independent imputation behavior.
trace (dict[str, Any] | None) – Optional mutable trace dictionary populated with grouped-imputation diagnostics.

Returns:

pd.DataFrame – PTR table re-indexed to expression_df.index with unobserved genes filled.

Raises:

ValueError – When strategy/statistic is unrecognised or when sample_before_imputation lacks a reference frame.

prepare_ptr_frame(ptr_df: DataFrame, expression_df: DataFrame, config: APIConfig, metabolic_genes: list[str] | None = None, model_artifact: Any | None = None) → DataFrame[source]

Generated: validation needed.

Description:: Full PTR preprocessing pipeline: standardize → deduplicate → optionally filter to metabolic genes → convert to linear scale → impute within-sample missing values → expand to expression gene index.

Parameters:

ptr_df (pd.DataFrame) – Raw PTR input table (genes × tissue-types).
expression_df (pd.DataFrame) – Preprocessed expression table used to define the target gene universe and guide imputation.
config (APIConfig) – Root API configuration. PTR options read from config.ptr.
metabolic_genes (list[str] | None) – Optional list of gene IDs from the metabolic model. When provided and config.ptr.impute_from_metabolic_genes_only is True, PTR is filtered to this set before imputation.
model_artifact (Any | None) – Optional cobra-like model used to expand shorthand special gene groups such as transport_reactions.

Returns:

pd.DataFrame – Fully preprocessed PTR table aligned to the expression gene index.

get_latest_preparation_diagnostics() → dict[str, Any][source]

Generated: validation needed.

Description:: Return diagnostics captured during latest PTR preparation call.

Returns:: dict[str, Any] – PTR preparation diagnostics for inter-stage artifact persistence.

static resolve_special_gene_groups(config: APIConfig, model_artifact: Any | None = None, expression_gene_ids: set[str] | None = None) → dict[str, list[str]][source]

Generated: validation needed.

Description:: Resolve user-provided special gene groups used by PTR unobserved-gene imputation. This endpoint enables independent group-wise imputation (e.g., transport genes or other custom partitions). Group values may contain gene IDs or reaction IDs; transport_reactions with an empty list auto-resolves transport-associated genes from model.

Parameters:

config (APIConfig) – Root API configuration.
model_artifact (Any | None) – Optional cobra-like model used for shorthand and reaction-based group expansion.
expression_gene_ids (set[str] | None) – Optional expression-gene universe used to filter resolved group members.

Returns:

dict[str, list[str]] – Mapping of group name to normalized gene IDs.

Raises:

ValueError – When transport_reactions shorthand is requested without a model artifact.

classmethod requires_sample_type_map(ptr_method: str) → bool[source]

Generated: validation needed.

Description:: Report whether a PTR implementation strategy requires an explicit expression→PTR sample/tissue mapping.

Parameters:: ptr_method (str) – PTR strategy key from config.protein.ptr_method.
Returns:: bool – True when selected PTR method requires expression.sample_type_map.

static resolve_sample_type_map(expression_df: DataFrame, sample_type_map: dict[str, str] | str | None) → dict[str, str][source]

Generated: validation needed.

Description:

Build a {expression_column: ptr_column} mapping from the user-supplied sample_type_map.

None → identity map (each expression column maps to itself).
str → every expression column maps to that single PTR column.
dict → used directly; expression columns absent from the dict fall back to an identity mapping.

Labels are normalized for robust matching: lower-case, stripped whitespace, and _ptr suffix removed.

Parameters:

expression_df (pd.DataFrame) – Expression table whose columns define the source keys.
sample_type_map (dict[str, str] | str | None) – User-configured column mapping.

Returns:

dict[str, str] – Mapping of expression column → PTR column.

combine_expression_with_ptr(expression_df: DataFrame, ptr_df: DataFrame, sample_type_map: dict[str, str] | str | None = None) → DataFrame[source]

Generated: validation needed.

Description:: Multiply expression values by PTR values for each gene, using the resolved sample-type column mapping to pair expression columns with PTR columns. Genes absent from PTR retain their expression values.

Parameters:

expression_df (pd.DataFrame) – Preprocessed expression table (genes × expression-samples).
ptr_df (pd.DataFrame) – Preprocessed PTR table (genes × tissue-types).
sample_type_map (dict[str, str] | str | None) – Mapping from expression column names to PTR column names. str maps every expression column to the same PTR column; None falls back to direct column intersection.

Returns:

pd.DataFrame – Combined protein abundance table with same shape as expression_df.