这里以一个湿地物种组成的数据为例来看物种组成和环境条件之间的关系。数据包含植物组成和湿地水环境化学组成。
library(vegan)
## Loading required package: permute
## Loading required package: lattice
## This is vegan 2.4-4
chem <- read.delim('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/chemistry.txt', row.names = 1)
chem <- chem[,-15] # removes slope, which is not chemical variables
PCA <- rda(chem,scale = T) #the argument scale standarizes the variables
head(summary(PCA))
##
## Call:
## rda(X = chem, scale = T)
##
## Partitioning of correlations:
## Inertia Proportion
## Total 14 1
## Unconstrained 14 1
##
## Eigenvalues, and their contribution to the correlations
##
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Eigenvalue 4.3861 1.8311 1.6322 1.21612 0.92221 0.89790 0.72924
## Proportion Explained 0.3133 0.1308 0.1166 0.08687 0.06587 0.06414 0.05209
## Cumulative Proportion 0.3133 0.4441 0.5607 0.64754 0.71341 0.77755 0.82963
## PC8 PC9 PC10 PC11 PC12 PC13
## Eigenvalue 0.63060 0.53379 0.40476 0.29093 0.22711 0.18121
## Proportion Explained 0.04504 0.03813 0.02891 0.02078 0.01622 0.01294
## Cumulative Proportion 0.87468 0.91280 0.94172 0.96250 0.97872 0.99166
## PC14
## Eigenvalue 0.11672
## Proportion Explained 0.00834
## Cumulative Proportion 1.00000
##
## Scaling 2 for species and site scores
## * Species are scaled proportional to eigenvalues
## * Sites are unscaled: weighted dispersion equal on all dimensions
## * General scaling constant of scores: 5.574992
##
##
## Species scores
##
## PC1 PC2 PC3 PC4 PC5 PC6
## Ca -1.2395 0.26073 -0.08279 0.07019 -0.21455 0.31631
## Mg -1.2847 -0.07431 0.03779 -0.03741 -0.05047 0.34011
## Fe 0.2337 -0.74467 -0.18301 0.91026 -0.36570 0.12707
## K -0.8757 -1.02193 0.02213 -0.23047 0.24166 -0.06626
## Na -0.9686 -0.65514 -0.03874 0.17597 0.40968 -0.29167
## Si -0.8009 -0.94159 0.13567 -0.40204 -0.33330 0.26581
## ....
##
##
## Site scores (weighted sums of species scores)
##
## PC1 PC2 PC3 PC4 PC5 PC6
## 1 -1.298386 -1.6531 0.32020 -0.410813 -0.27795 1.1076
## 2 0.331266 0.3436 0.87967 -0.112347 0.05342 -0.5761
## 3 -0.578705 1.0812 -0.49588 -0.339626 0.30775 -0.4156
## 4 -0.517608 1.1056 0.04964 0.101172 0.03688 0.1731
## 5 -0.759527 -0.1386 0.32003 -0.230092 -0.74608 0.2601
## 6 -0.004093 0.7204 0.44371 0.005363 0.04296 -0.2904
## ....
从结果上看总的变异为14, 第一轴解释了4.3861, 即解释了31%。在这里,因为我们对所有的变量进行过标准化处理(即每一个变量的均值为0,标准差为1)。我们一共包含14个变量,即总变异为14.
stand.chem <- scale(chem)
stand.chem.var <- apply(stand.chem, 2, var)
stand.chem.var
## Ca Mg Fe K Na Si SO4 PO4 NO3
## 1 1 1 1 1 1 1 1 1
## NH3 Cl Corg pH conduct
## 1 1 1 1 1
sum(stand.chem.var)
## [1] 14
如果想进一步知道每一个变量对于第一轴和第二轴到底贡献了多少我们可以利用score
loadings <- scores(PCA, display = "species", scaling = 0)
loadings
## PC1 PC2
## Ca -0.397207108 0.12931763
## Mg -0.411695240 -0.03685437
## Fe 0.074879310 -0.36934346
## K -0.280639035 -0.50686184
## Na -0.310399857 -0.32493807
## Si -0.256661125 -0.46701175
## SO4 -0.217276338 0.09023013
## PO4 0.115086359 -0.25838198
## NO3 0.034618703 0.03814265
## NH3 0.174227393 -0.02750712
## Cl 0.007920445 0.02129442
## Corg 0.322058906 -0.21255537
## pH -0.307202160 0.28944051
## conduct -0.368754666 0.24163441
## attr(,"const")
## [1] 5.574992
注意如果在这里不进行变量的标准化,可能会引发一些问题。如果某一变量本身数值较大,其变异程度可能也会很大,这会影响整体分析。另外,具体画图可以适用biplot功能
biplot(PCA, display = "species", scaling = "species")

怎么去理解这个结果?我们可以把第一轴看作是养分轴,即poor-rich gradient.
另一个栗子:对数据进行转化的情况.这里我们介绍一个在进行pca分析前对数据进行hellinger转化。 套路:读入数据 —— 数据转化 —— pca分析 —— 做图
vltava.spe <- read.delim ('https://raw.githubusercontent.com/zdealveindy/anadat-r/master/data/vltava-spe.txt', row.names = 1)
vltava.spe.hel <- decostand(log1p(vltava.spe), "hellinger")
tbPCA <- rda(vltava.spe.hel)
source ('http://www.davidzeleny.net/anadat-r/doku.php/en:numecolr:cleanplot.pca?do=export_code&codeblock=0') # define the cleanplot.pca function
## Warning in file(filename, "r", encoding = encoding): "internal" method
## cannot handle https redirection to: 'https://www.davidzeleny.net/anadat-r/
## doku.php/en:numecolr:cleanplot.pca?do=export_code&codeblock=0'
## Warning in file(filename, "r", encoding = encoding): "internal" method
## failed, so trying "libcurl"
## Warning in file(filename, "r", encoding = encoding): "internal" method
## cannot handle https redirection to: 'https://www.davidzeleny.net/anadat-r/
## doku.php/en:numecolr:cleanplot.pca?do=export_code&codeblock=1'
## Warning in file(filename, "r", encoding = encoding): "internal" method
## failed, so trying "libcurl"
cleanplot.pca(tbPCA)
多说一句,上图中的scaling 1 和 2啥意思呢? 左图为1的情况,即关注点在对象(或者理解为不同站点,数据中的排)。右图为2的情况,即关注点在描述某一站点的组分(或者理解为物种,即数据中的列)。