DefaultPTRImplementation
- class VmaxBuilder.protein.ptr_implementation.DefaultPTRImplementation[source]
Generated: validation needed.
- Description:
PTR preprocessing and combination logic for the expression+PTR protein abundance pathway. Covers standardisation, deduplication, log→linear conversion, within-sample imputation, and expansion to the full expression gene index.
Public Methods
Generated: validation needed.
Generated: validation needed.
get_weights(…)Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
Generated: validation needed.
- resolve_ptr_frame(scaffold: Scaffold, config: APIConfig) DataFrame | None[source]
Generated: validation needed.
- Description:
Resolve PTR dataframe from configured scaffold/config sources.
- Parameters:
scaffold (Scaffold) – Shared pipeline scaffold.
config (APIConfig) – Root API configuration.
- Returns:
pd.DataFrame | None – PTR dataframe when available.
- static standardize_ptr_frame(ptr_df: DataFrame) DataFrame[source]
Generated: validation needed.
- Description:
Standardize missing-value tokens, trim string whitespace, and coerce all values to numeric. Resets integer-indexed frames to use the first column as the index. Normalises column names to lower-case.
- Parameters:
ptr_df (pd.DataFrame) – Raw PTR table (genes × samples).
- Returns:
pd.DataFrame – Numeric PTR frame with standardized missing values and lower-case column names.
- static remove_ptr_duplicates(ptr_df: DataFrame) DataFrame[source]
Generated: validation needed.
- Description:
For each duplicated gene row, retain the row with the most non-missing values. When tied, keep the first occurrence.
- Parameters:
ptr_df (pd.DataFrame) – PTR table potentially containing duplicate gene identifiers in the index.
- Returns:
pd.DataFrame – PTR table with unique gene index.
- static transform_ptr_to_linear(ptr_df: DataFrame, pretransformed_type: str = 'linear') DataFrame[source]
Generated: validation needed.
- Description:
Convert PTR frame to linear space from configured transform state. Supports
nonealias forlinear.
- Parameters:
ptr_df (pd.DataFrame) – PTR table in source transform space.
pretransformed_type (str) – Source transform key. One of
linear,log10,log2,ln.
- Returns:
pd.DataFrame – PTR table transformed to linear space.
- Raises:
ValueError – When
pretransformed_typeis unsupported.
- static get_weights(df: DataFrame, col_stat_function: Callable[[Series], float]) Series[source]
Generated: validation needed.
- Description:
Compute per-column weighting ratios for within-sample imputation.
- Parameters:
df (pd.DataFrame) – PTR frame in linear space.
col_stat_function (Callable[[pd.Series], float]) – Statistic function for column aggregation and global normalisation.
- Returns:
pd.Series – Weight ratio per PTR column.
- static _validate_within_sample_weighting(use_weighted: bool, weighted_statistic: str | None) None[source]
Generated: validation needed.
- Description:
Validate effective within-sample weighting inputs.
- Parameters:
use_weighted (bool) – Weighted-imputation toggle.
weighted_statistic (str | None) – Column-statistic key for weighted mode.
- Raises:
ValueError – When weighted mode lacks a strategy statistic.
- static _resolve_within_sample_stat_functions(use_weighted: bool, weighted_statistic: str | None, imputation_statistic: str) tuple[Callable[[Series], float], Callable[[Series], float] | None][source]
Generated: validation needed.
- Description:
Resolve callable statistic functions used by within-sample imputation.
- Parameters:
use_weighted (bool) – Weighted-imputation toggle.
weighted_statistic (str | None) – Weighted-column statistic key.
imputation_statistic (str) – Row-wise statistic key.
- Returns:
tuple[Callable[[pd.Series], float], Callable[[pd.Series], float] | None] – Row-statistic function and optional weighted-column statistic function.
- Raises:
ValueError – When requested statistic keys are unsupported.
- static impute_within_tissue_ptrs(ptr_df: DataFrame, use_weighted: bool = True, weighted_statistic: str | None = 'median', imputation_statistic: str = 'median') DataFrame[source]
Generated: validation needed.
- Description:
Impute missing values for genes observed in at least one sample. Weighted behaviour is controlled by
use_weighted.
- Parameters:
ptr_df (pd.DataFrame) – PTR table in linear space (genes × samples).
use_weighted (bool) – Apply weighted per-column scaling during within-sample imputation.
weighted_statistic (str | None) – Statistic for weighted column ratio.
imputation_statistic (str) – Statistic used for row-wise base fill.
- Returns:
pd.DataFrame – PTR table with within-sample missing values filled.
- Raises:
ValueError – When weighting configuration or statistic is unrecognised.
- static _resolve_unobserved_source_frame(ptr_df: DataFrame, strategy: str, reference_df: DataFrame | None) DataFrame[source]
Generated: validation needed.
- Description:
Resolve source PTR frame used to compute unobserved-gene fill statistics.
- Parameters:
ptr_df (pd.DataFrame) – PTR frame after within-sample imputation.
strategy (str) – Unobserved-gene strategy.
reference_df (pd.DataFrame | None) – Optional pre-imputation frame.
- Returns:
pd.DataFrame – Source frame for per-sample statistics.
- Raises:
ValueError – When before-imputation strategy is selected without a reference frame.
- static _compute_per_sample_fill_values(source_df: DataFrame, statistic: str) dict[str, float][source]
Generated: validation needed.
- Description:
Compute one fill value per sample column from chosen source frame.
- Parameters:
source_df (pd.DataFrame) – Source frame used for statistics.
statistic (str) – Aggregation statistic key.
- Returns:
dict[str, float] – Per-sample fill values.
- Raises:
ValueError – When statistic key is unsupported.
- static _apply_global_unobserved_fill(df: DataFrame, fill_values: dict[str, float], target_gene_ids: set[str]) DataFrame[source]
Generated: validation needed.
- Description:
Fill missing cells using one global per-sample statistic.
- Parameters:
df (pd.DataFrame) – Target PTR frame aligned to expression index.
fill_values (dict[str, float]) – Per-sample fallback values.
target_gene_ids (set[str]) – Gene IDs eligible for unobserved-gene fill.
- Returns:
pd.DataFrame – Frame with missing values filled.
- static _apply_grouped_unobserved_fill(df: DataFrame, source_df: DataFrame, statistic: str, special_gene_groups: dict[str, list[str]], fallback_fill_values: dict[str, float], target_gene_ids: set[str], trace: dict[str, Any] | None = None) DataFrame[source]
Generated: validation needed.
- Description:
Fill missing cells by special-gene groups with independent per-group statistics and global fallback.
- Parameters:
df (pd.DataFrame) – Target PTR frame aligned to expression index.
source_df (pd.DataFrame) – Source frame for statistic calculation.
statistic (str) – Aggregation statistic key.
special_gene_groups (dict[str, list[str]]) – Group name to gene IDs.
fallback_fill_values (dict[str, float]) – Global per-sample fallback values.
target_gene_ids (set[str]) – Gene IDs eligible for unobserved-gene fill.
trace (dict[str, Any] | None) – Optional mutable trace dictionary populated with special-group mapping and assigned imputed values.
- Returns:
pd.DataFrame – Frame with grouped missing-value imputation applied.
- static _resolve_grouped_fill_value(*, column_name: Any, group_name: str | None, group_fill_values: dict[str, dict[str, float]], fallback_fill_values: dict[str, float]) float[source]
Generated: validation needed.
- Description:
Resolve one grouped unobserved-gene fill value with fallback.
- Parameters:
column_name (Any) – Sample/column identifier.
group_name (str | None) – Optional resolved group name.
group_fill_values (dict[str, dict[str, float]]) – Per-group fill values.
fallback_fill_values (dict[str, float]) – Global fallback fill values.
- Returns:
float – Assigned fill value.
- static impute_unobserved_genes(ptr_df: DataFrame, expression_df: DataFrame, unobserved_gene_ids: set[str], strategy: str = 'sample_after_imputation', statistic: str = 'median', reference_df: DataFrame | None = None, special_gene_groups: dict[str, list[str]] | None = None, use_special_groups: bool = False, trace: dict[str, Any] | None = None) DataFrame[source]
Generated: validation needed.
- Description:
Expand PTR to match the full gene index of
expression_df. Genes present in expression but absent from PTR are filled using a per-sample statistic.sample_after_imputationcomputes the statistic on the incoming (already-imputed) PTR values;sample_before_imputationbehaves identically at call time since the pre-imputation snapshot must be supplied externally by the caller if needed.
- Parameters:
ptr_df (pd.DataFrame) – PTR table after within-sample imputation.
expression_df (pd.DataFrame) – Expression table whose index defines the target gene universe.
unobserved_gene_ids (set[str]) – Gene IDs present in expression but absent from PTR to be filled.
strategy (str) – Imputation strategy for unobserved genes. One of
sample_after_imputation,sample_before_imputation.statistic (str) – Per-sample aggregation statistic. One of
median,mean,mode,max,min.reference_df (pd.DataFrame | None) – Pre-within-imputation PTR frame used when
strategy='sample_before_imputation'.special_gene_groups (dict[str, list[str]] | None) – Optional special groups to impute independently.
use_special_groups (bool) – Enable special-group independent imputation behavior.
trace (dict[str, Any] | None) – Optional mutable trace dictionary populated with grouped-imputation diagnostics.
- Returns:
pd.DataFrame – PTR table re-indexed to
expression_df.indexwith unobserved genes filled.- Raises:
ValueError – When
strategy/statisticis unrecognised or whensample_before_imputationlacks a reference frame.
- prepare_ptr_frame(ptr_df: DataFrame, expression_df: DataFrame, config: APIConfig, metabolic_genes: list[str] | None = None, model_artifact: Any | None = None) DataFrame[source]
Generated: validation needed.
- Description:
Full PTR preprocessing pipeline: standardize → deduplicate → optionally filter to metabolic genes → convert to linear scale → impute within-sample missing values → expand to expression gene index.
- Parameters:
ptr_df (pd.DataFrame) – Raw PTR input table (genes × tissue-types).
expression_df (pd.DataFrame) – Preprocessed expression table used to define the target gene universe and guide imputation.
config (APIConfig) – Root API configuration. PTR options read from
config.ptr.metabolic_genes (list[str] | None) – Optional list of gene IDs from the metabolic model. When provided and
config.ptr.impute_from_metabolic_genes_onlyisTrue, PTR is filtered to this set before imputation.model_artifact (Any | None) – Optional cobra-like model used to expand shorthand special gene groups such as
transport_reactions.
- Returns:
pd.DataFrame – Fully preprocessed PTR table aligned to the expression gene index.
- get_latest_preparation_diagnostics() dict[str, Any][source]
Generated: validation needed.
- Description:
Return diagnostics captured during latest PTR preparation call.
- Returns:
dict[str, Any] – PTR preparation diagnostics for inter-stage artifact persistence.
- static resolve_special_gene_groups(config: APIConfig, model_artifact: Any | None = None, expression_gene_ids: set[str] | None = None) dict[str, list[str]][source]
Generated: validation needed.
- Description:
Resolve user-provided special gene groups used by PTR unobserved-gene imputation. This endpoint enables independent group-wise imputation (e.g., transport genes or other custom partitions). Group values may contain gene IDs or reaction IDs;
transport_reactionswith an empty list auto-resolves transport-associated genes from model.
- Parameters:
config (APIConfig) – Root API configuration.
model_artifact (Any | None) – Optional cobra-like model used for shorthand and reaction-based group expansion.
expression_gene_ids (set[str] | None) – Optional expression-gene universe used to filter resolved group members.
- Returns:
dict[str, list[str]] – Mapping of group name to normalized gene IDs.
- Raises:
ValueError – When
transport_reactionsshorthand is requested without a model artifact.
- classmethod requires_sample_type_map(ptr_method: str) bool[source]
Generated: validation needed.
- Description:
Report whether a PTR implementation strategy requires an explicit expression→PTR sample/tissue mapping.
- Parameters:
ptr_method (str) – PTR strategy key from
config.protein.ptr_method.- Returns:
bool –
Truewhen selected PTR method requiresexpression.sample_type_map.
- static resolve_sample_type_map(expression_df: DataFrame, sample_type_map: dict[str, str] | str | None) dict[str, str][source]
Generated: validation needed.
- Description:
Build a
{expression_column: ptr_column}mapping from the user-suppliedsample_type_map.None→ identity map (each expression column maps to itself).str→ every expression column maps to that single PTR column.dict→ used directly; expression columns absent from the dict fall back to an identity mapping.
Labels are normalized for robust matching: lower-case, stripped whitespace, and
_ptrsuffix removed.
- Parameters:
expression_df (pd.DataFrame) – Expression table whose columns define the source keys.
sample_type_map (dict[str, str] | str | None) – User-configured column mapping.
- Returns:
dict[str, str] – Mapping of expression column → PTR column.
- combine_expression_with_ptr(expression_df: DataFrame, ptr_df: DataFrame, sample_type_map: dict[str, str] | str | None = None) DataFrame[source]
Generated: validation needed.
- Description:
Multiply expression values by PTR values for each gene, using the resolved sample-type column mapping to pair expression columns with PTR columns. Genes absent from PTR retain their expression values.
- Parameters:
expression_df (pd.DataFrame) – Preprocessed expression table (genes × expression-samples).
ptr_df (pd.DataFrame) – Preprocessed PTR table (genes × tissue-types).
sample_type_map (dict[str, str] | str | None) – Mapping from expression column names to PTR column names.
strmaps every expression column to the same PTR column;Nonefalls back to direct column intersection.
- Returns:
pd.DataFrame – Combined protein abundance table with same shape as
expression_df.