生存分析：优化Cox模型的部分似然

介绍
Cox比例风险模型
优化问题
实施
结论
参考文献

1.介绍

生存分析涵盖了一系列用于描述时间到事件数据的统计方法。

在本文中，我们介绍了一种流行的生存分析算法，Cox比例风险模型¹。然后，我们定义了其对数部分似然和梯度，并通过一个实际的Python示例对其进行优化，以找到最佳的模型参数集。

2.Cox比例风险模型

我们将生存率定义为在一定时间段后未经历不良事件（例如死亡）的患者百分比。

Cox比例风险模型可以评估变量与生存率之间的关联。给定一组协变量x，它将风险函数定义为：

从公式中我们可以观察到，风险函数h(t|x)与基线风险函数h₀(t)和相对风险exp(βx)成比例。

基础风险函数h₀(t)不依赖于协变量。由于h₀(.)的形式未指定，该模型是半参数化的。

让我们通过一个仅涉及一个协变量的简化场景来解释模型系数的含义。我们考虑一个风险因素xᵢ，例如吸烟，作为二进制变量（0：非吸烟者 vs. 1：吸烟者）。Cox模型可以表示为h(t|xᵢ)= h₀(t)exp(βxᵢ)，其中exp(β)表示吸烟相对于不吸烟的不良事件的相对风险：

吸烟导致的风险：(xᵢ=1): h₀(t)exp(β⋅xᵢ) = h₀(t)exp(β⋅1) = h₀(t)exp(β)
非吸烟导致的风险：(xᵢ=0): h₀(t)exp(β⋅xᵢ) = h₀(t)exp(β⋅0) = h₀(t)
相对风险 = 吸烟导致的风险 / 非吸烟导致的风险：h₀(t)exp(β) / h₀(t) = exp(β)

相对风险exp(β) — 也称为风险比 — 是常数，不随时间变化。

3.优化问题

在数据科学中，“拟合”模型到数据集的任务表示寻找一组模型参数，以优化某个特定的目标函数，例如最小化损失函数或最大化对数似然。

在我们的情况下，我们需要在不知道h₀(.)的情况下估计β。为此，Cox提出最大化部分似然²：

在上述方程中：

K是按时间顺序排序的事件（死亡）时间的集合：t₁ < t₂ < … <tₖ。
R(tⱼ)标识时间tⱼ时处于风险中的受试者集合。

直观地说，部分似然是在观察到的事件时间集合中，根据在这些时间点上处于风险中的患者集合和比例风险假设下，看到不良事件的条件概率的乘积。

我们可以观察到：

L(β)与ho(t)无关，ho(t)可以保持未指定。
L(β)不考虑实际事件时间，只考虑其顺序。

我们可以将对数部分似然推导为：

在上述方程中：

N是受试者数量。
θ = exp(βx)。
δⱼ表示事件（1：死亡，0：其他）。

为了拟合Cox模型，需要找到将负对数部分似然最小化的β系数。

我们回顾一下，负部分似然在大多数情况下是一个严格凸函数³。因此，它具有唯一的全局最小值。

4.实施

让我们导入所需的库：

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import minimize
from sksurv.datasets import load_whas500

我们加载了scikit-survival⁵软件包中提供的Worcester心脏病发作研究数据集⁴。特别地：

我们关注两个协变量：

afb：心房颤动（0：否，1：是）
mitype：心肌梗死类型（0：非Q波，1：Q波）

我们调整数据以考虑并列情况，即在同一时间发生不良事件的患者。由于连续风险的假设，Cox模型不允许存在并列情况。为了简单起见，我们向每个事件日期添加了一小部分随机噪声，以将它们排除。

我们按日期对数据集进行排序，因为部分似然需要有序的事件时间。

# load the whas500 dataset
X, target = load_whas500()

# let us consider two covariates
cols = ["afb", "mitype"]

df = X[cols].\
        rename(columns={cols[0]: "v1", 
                        cols[1]: "v2"}).\
        astype(float)

# extract events and respective times
df["event"], df["time"] = [list(i) for i in zip(*target)]

# add random noise to the event time to avoid ties
df.time = df.time.apply(lambda x : x + np.random.normal(2, 1))

# sort observations by descending event time
df = df.sort_values("time", ascending=False).reset_index(drop=True)

# inspect first rows
df.head(10)

v = df[["v1", "v2"]].to_numpy()"v1", "v2"]].to_numpy()
time, event = df.time.to_numpy(), df.event.to_numpy().astype(int)

现在我们需要为优化任务定义目标函数，即我们将要最小化的负对数部分似然：

注意：在标准机器学习问题中，X通常描述输入特征。然而在我们的情况下，未知变量是β，我们正在尝试找到最佳的一组值。

def get_theta(x):
    '''
    Return theta as per negative log-partial likelihood
    of the Cox model and its gradient.
    It assumes input features v to be ordered by event time.

    Args:
        - x: beta coefficients 

    Output:
        - theta_l: cumulative theta <numpy.ndarray>
        - theta_l_v: cumulative theta by features <numpy.ndarray>
    '''
    theta = np.exp(np.dot(v, x))
    theta_l = np.cumsum(theta)
    theta_l_v = np.cumsum(v * theta.reshape(-1,1), axis=0)
    return theta_l, theta_l_v


def objective_function(x):
    '''
    Return the negative log-partial likelihood 
    of the Cox model

    Args:
        - x: beta coefficients <numpy.ndarray>
    Output:
        - Negative log-partial likelihood of the Cox model
    '''
    theta_l, _ = get_theta(x)
    return -np.sum(event * (np.dot(v, x) - np.log(theta_l)))

我们导出目标函数的梯度，即相对于β的导数，如下所示：

def gradient(x):
    '''
    Return the gradient of the negative log-partial
    likelihood of the Cox model

    Args:
        - x: beta coefficients <numpy.ndarray>
    Output:
        - Gradient of the objective function
    '''
    theta_l, theta_l_v = get_theta(x)
    return -np.sum(event.reshape(-1,1) * (v-(theta_l_v/theta_l.reshape(-1,1))), axis=0)

现在我们可以初始化β并执行最小化任务。我们使用Newton-CG⁶算法和scipy包：

# starting values for beta
beta = np.array([1, 1])

opt_result = minimize(
    objective_function,
    beta, 
    method = "Newton-CG",
    jac = gradient
)

opt_result

结果为：

β₁ = 0.5293
β₂ = -0.6541

我们可以使用库在相同的输入数据上拟合Cox模型，并验证我们将获得相同的β值集：

from sksurv.linear_model import CoxPHSurvivalAnalysis

model = CoxPHSurvivalAnalysis()
model_fit = model.fit(
    df[["v1", "v2"]], 
    np.array(list(zip(df.event, df.time)), dtype=np.dtype("bool, float")))

model_fit.coef_

实际上，β系数几乎相同，在小数点后第七位之后有细微差异。

让我们绘制估计的最优值和目标函数：

def objective_function_in_x(x0, x1):
    '''
    Return the negative log-partial likelihood 
    of the Cox model evaluated in the given beta

    Args:
        - x0: input beta_0 <numpy.ndarray>
        - x1: input beta_1 <numpy.ndarray>
    Output:
        - objective function in beta_0, beta_1 <numpy.ndarray>
    '''
    v0, v1, l = v[:,0], v[:,1], v.shape[0]
    theta = np.exp(x0*v0 + x1*v1)
    return -np.sum(event * ((x0*v0 + x1*v1) - np.log(np.cumsum(theta).reshape(-1, l))))

def get_plot_data(inf=-5, sup=5, size=10):
    '''
    Return a three-dim square box with the evaluation
    of the negative log-partial likelihood of the Cox model

    Args:
      - inf: min value of the box, default: -5 <int>
      - sup: min value of the box, default: 5 <int>
      - size: size of the output coordinates arrays, default: 10 <int>
    Output:
      - x0: input beta_0 <numpy.ndarray>
      - x1: input beta_1 <numpy.ndarray>
      - z: objective function in beta_0, beta_1 <numpy.ndarray>
    '''
    x0, x1 = np.linspace(inf, sup, size), np.linspace(inf, sup, size)
    x0, x1 = np.meshgrid(x0, x1)
    z = np.zeros((size, size))
    for i in range(0, x0.shape[0]):
        for j in range(0, x0.shape[1]):
            z[i][j] = objective_function_in_x(x0[i][j], x1[i][j])
    return x0, x1, z

def get_min_obj_function(model):
    '''
    Return coordinates of local optimum found with optimization

    Args:
      - model: instance of <scipy.optimize._optimize.OptimizeResult>
    Output:
      - x0: optimum for beta_0 <numpy.ndarray>
      - x1: optimum for beta_1 <numpy.ndarray>
      - z: objective function in the optimum <numpy.ndarray>
    '''
    x0, x1 = np.array(model.x[0]), np.array(model.x[1])
    z = objective_function_in_x(x0, x1)
    return x0, x1, z

# generate data
x0, x1, z = get_plot_data(-5, 5, 10)
x0_min, x1_min, z_min = get_min_obj_function(opt_result)

# plot the objective function and the local optimum
fig = plt.figure(figsize=plt.figaspect(0.4))

# left subplot
ax = fig.add_subplot(1, 2, 1, projection="3d")
ax.contour3D(x0, x1, z, 100, cmap="viridis")
ax.scatter(x0_min, x1_min, z_min, marker="o", color="red", linewidth=50000)
ax.set_xlabel("$β_1$")
ax.set_ylabel("$β_2$")
ax.set_zlabel("- Log-Partial Likelihood")

# right subplot
ax = fig.add_subplot(1, 2, 2, projection="3d")
ax.contour3D(x0, x1, z, 100, cmap="viridis")
ax.scatter(x0_min, x1_min, z_min, marker="o", color="red", linewidth=50000)
ax.set_xlabel("$β_1$")
ax.set_ylabel("$β_2$")
ax.set_zlabel("- Log-Partial Likelihood")
ax.view_init(10, 30)
fig.suptitle("Negative log-partial likelihood of the Cox model with local optimum", fontsize=10);

注意：使用先前定义的函数解决的优化问题可以具有任意数量的输入变量v。然而，3D图只需要考虑两个变量。实际上，3D图只能在每个轴上显示一个β系数。

从图中可以看出，负对数部分似然是一个凸损失函数。

5.结论

在生存分析的背景下，我们介绍了Cox比例风险模型，并在输入数据上拟合了它。特别是，我们用Python编写了负对数部分似然及其梯度。然后，我们将其最小化，以找到最佳的模型参数集。

6.参考文献

[1] D. R. Cox, Regression Models and Life-Tables, Journal of the Royal Statistical Society. Series B (Methodological), Vol. 34, n°2., pp. 187–220, 1972.

[2] https://en.wikipedia.org/wiki/Likelihood_function

[3] C. M. Winson et al., Fenchel duality of Cox partial likelihood with an application in survival kernel learning, Artificial Intelligence in Medicine,
vol. 116, 102077, 2021.

[4] S. Pölsterl, scikit-survival: A Library for Time-to-Event Analysis Built on Top of scikit-learn, Journal of Machine Learning Research, vol. 21, no. 212, pp. 1–6, 2020 (Package website).

[5] https://scikit-survival.readthedocs.io/en/stable/api/generated/sksurv.datasets.load_whas500.html

[6] https://docs.scipy.org/doc/scipy/reference/optimize.minimize-newtoncg.html#optimize-minimize-newtoncg

注意：whas500数据集可以从scikit-survival软件包中免费使用。scikit-survival软件包基于GPL v3许可。

✄-----------------------------------------------

看到这里，说明你喜欢这篇文章，请点击「在看」或顺手「转发」「点赞」。

欢迎微信搜索「panchuangxx」，添加小编磐小小仙微信，每日朋友圈更新一篇高质量推文（无广告），为您提供更多精彩内容。

▼ ▼ 扫描二维码添加小编 ▼ ▼