Kaggle HMS 第二名方案!
本次我们分享Kaggle HMS 第二名的方案,该方案能获得第二名很大原因来源于一个实验的奇怪结果。
我们向2D-CNN模型中输入bsx4xHxW的数据,最终产生了比单图像方法更差的结果。我开始思考为什么?
因此,我决定对频谱使用3D-CNN,对原始egg信号使用 2D-CNN 模型。
01
Spectrum 模型
3d CNN, X3d-l there. 使用stft transform做预处理.
对应代码
class Transform50s(nn.Module):
def __init__(self, ):
super().__init__()
self.wave_transform = torchaudio.transforms.Spectrogram(n_fft=512, win_length=128, hop_length=50, power=1)
def forward(self, x):
image = self.wave_transform(x)
image = torch.clip(image, min=0, max=10000) / 1000
n, c, h, w = image.size()
image = image[:, :, :int(20 / 100 * h + 10), :]
return image
class Transform10s(nn.Module):
def __init__(self, ):
super().__init__()
self.wave_transform = torchaudio.transforms.Spectrogram(n_fft=512, win_length=128, hop_length=10, power=1)
def forward(self, x):
image = self.wave_transform(x)
image = torch.clip(image, min=0, max=10000) / 1000
n, c, h, w = image.size()
image = image[:, :, :int(20 / 100 * h + 10), :]
return image
class Model(nn.Module):
def __init__(self):
super().__init__()
model_name = "x3d_l"
self.net = torch.hub.load('facebookresearch/pytorchvideo',
model_name, pretrained=True)
self.net.blocks[5].pool.pool = nn.AdaptiveAvgPool3d(1)
# self.net.blocks[5]=nn.Identity()
# self.net.avgpool = nn.Identity()
self.net.blocks[5].dropout = nn.Identity()
self.net.blocks[5].proj = nn.Identity()
self.net.blocks[5].activation = nn.Identity()
self.net.blocks[5].output_pool = nn.Identity()
def forward(self, x):
x = self.net(x)
return x
class Net(nn.Module):
def __init__(self, num_classes=1):
super().__init__()
self.preprocess50s = Transform50s()
self.preprocess10s = Transform10s()
self.model = Model()
self.pool = nn.AdaptiveAvgPool3d(1)
self.fc = nn.Linear(2048, 6, bias=True)
self.drop = nn.Dropout(0.5)
def forward(self, eeg):
# do preprocess
bs = eeg.size(0)
eeg_50s = eeg
eeg_10s = eeg[:, :, 4000:6000]
x_50 = self.preprocess50s(eeg_50s)
x_10 = self.preprocess10s(eeg_10s)
x = torch.cat([x_10, x_50], dim=1)
x = torch.unsqueeze(x, dim=1)
x = torch.cat([x, x, x], dim=1)
x = self.model(x)
# x = self.pool(x)
x = x.view(bs, -1)
x = self.drop(x)
x = self.fc(x)
return x
02
eeg model
将 eeg (bsx16x10000) 视为图像。展开 dim=1( bsx1x16x10000),因为时间维度太大。此处我们进行reshape。
rclass Net(nn.Module):
def __init__(self,):
super(Net, self).__init__()
self.model = timm.create_model('efficientnet_b5', pretrained=True, in_chans=3)
self.pool = nn.AdaptiveAvgPool2d(1)
self.fc = nn.Linear(2048, out_features=6, bias=True)
self.dropout = nn.Dropout(p=0.5)
def extract_features(self, x):
feature1 = self.model.forward_features(x)
return feature1
def forward(self, x):
bs = x.size(0)
reshaped_tensor = x.view(bs, 16, 1000, 10)
reshaped_and_permuted_tensor = reshaped_tensor.permute(0, 1, 3, 2)
reshaped_and_permuted_tensor = reshaped_and_permuted_tensor.reshape(bs, 16 * 10, 1000)
x = torch.unsqueeze(reshaped_and_permuted_tensor, dim=1)
x = torch.cat([x, x, x], dim=1)
bs = x.size(0)
x = self.extract_features(x)
x = self.pool(x)
x = x.view(bs, -1)
x = self.dropout(x)
x = self.fc(x)
return x
03
eeg+spectrum
使用x3d-l抽取spectrum的特征(此处仅仅使用Transform50s),使用efficientnetb5抽取原始egg特征。
通过合并这些模型,可以获得我的当前分数。最后分数是6个模型集成,(0.28->0.27 private lb)。
最终集成包括:
权重 [0.1,0.1,0.2,0.2,0.2,0.2]。