空 PCA 矩阵

haketest lv.1

发布时间：2022-05-11 05:30:50 284

相关标签： # 数据

对于通路 pi，我想提取 G 矩阵数据以生成中间矩阵 B∈Rn×ri，其中 ri 是通路 pi 中涉及的基因数。也就是说，矩阵 B 由行中的样本和列中给定途径的基因组成。

中间矩阵 B 是common_mrna具有（347 x 8053 维）的数据帧的转置矩阵。与数据帧中的列数（8053 x 347 维度）kegg_list具有相同的长度（347 项），这意味着每个路径对应于数据帧的行号（不是索引）。如果索引与数据帧的行索引匹配，我想将每一行附加到空矩阵 B，然后将其转置并转换为 347x8053 数据帧。common_mrnakegg_listcommon_mrnakegg_listcommon_mrna

接下来，使用 PCA，我想将矩阵 B 分解为不相关的分量，得到 Gpi∈Rn×q⁠，其中 q=5 是主成分（PC）的数量。

我下面的代码的问题是它产生了一个空的 dataframe G。

代码：

import numpy as np 
from sklearn.decomposition import PCA

p = [] # Initialize pathway list (columns)
G = np.zeros((8053,347)) # Initialize mRNA expression matrix
B = np.zeros((8053,347)) # Initialize intermediate matrix B
q = 5 # Number of PCs

# Populate intermediate matrix B
for i, p in enumerate(kegg_list):
  Bi = 0
  for index, row in common_mrna.iterrows():
    if i==len(index):
      Bi2 = Bi
      np.append(B[Bi], row)
      Bi2 = Bi2 + 1
B = B.transpose()

# PCA for yielding matrix G
pca_G = PCA()
pca_G.fit(B)
np.append(G, pca_G.transform(B)[:,0:q])
G = pd.DataFrame(G)
G.to_csv("./gbm_tcga/PCA_mrna.csv", index=False)
G

common_mrna数据帧示例

common_mrna = pd.DataFrame([[0.6747, -1.4892, -2.0670, 0.2337, 0.1255], [0.0051, 0.2122, -0.6536, 1.3746, -1.6958], [-0.4994, -0.2472, -0.1614, 0.9809, 1.3159]], columns=['TCGA-28-5207-01', 'TCGA-02-0089-01','TCGA-87-5896-01', 'TCGA-06-5410-01','TCGA-16-0861-01'], index=["DIABLO", "MRPL33", "RBM39"])

kegg_list实例

    kegg_list = ['Glycolysis_/_Gluconeogenesis',
     'Citrate_cycle_(TCA_cycle)',
     'Pentose_phosphate_pathway',
     'Pentose_and_glucuronate_interconversions',
     'Fructose_and_mannose_metabolism',
     'Galactose_metabolism']

期望输出：

B = array([[ 0.6747,  0.0051, -0.4994], [-1.4892,  0.2122, -0.2472], [-2.067 , -0.6536, -0.1614], [0.2337, 1.3746, 0.9809], [0.1255, -1.6958, 1.3159]])

特别声明：以上内容（图片及文字）均为互联网收集或者用户上传发布，本站仅提供信息存储服务！如有侵权或有涉及法律问题请联系我们。