Voozh

The Problem

You have environmental data (pH, organic matter, nutrients) and biodiversity data (Shannon index). You want to know: which environmental factors significantly affect diversity?

The quickest way to answer this is scatter plots + linear regression.

Here's a complete, runnable R workflow using real soil microbiome data — from a simple lm() call to a multi-panel publication-ready figure. Everything uses base R graphics, no ggplot2 required.

Data

👁

This dataset has 24 soil samples with 6 environmental factors and 2 diversity indices:

Variable	Description	Unit
pH	Soil acidity	—
AK	Available potassium	mg/kg
AP	Available phosphorus	mg/kg
AN	Available nitrogen	mg/kg
OM	Organic matter	g/kg
W	Water content	%
Shannon	Species diversity	—
PD	Phylogenetic diversity	—

# Read and inspectdf<-read.csv("sandian.csv",header=TRUE)str(df)head(df)

Single Factor: pH → Shannon

x<-df$pHy<-df$Shannonfit<-lm(y~x)summary(fit)

Multiple R-squared: 0.0004, p-value: 0.927

R² = 0.0004, p = 0.927 — pH has no significant linear relationship with Shannon diversity in this dataset. That's OK. Non-significant results are normal in real research.

Plot it:

par(mar=c(5,5,3,1))plot(x,y,xlab="pH",ylab="Shannon Index",pch=16,cex=1.5,col="#00AFBB",cex.lab=1.8,cex.axis=1.3,las=1)abline(fit,lwd=3,col="#00AFBB")

👁

Multi-Panel: All 6 Factors at Once

Doing one plot at a time is inefficient. Use layout() + for loop:

x_vars<-c("pH","AK","AP","AN","OM","W")x_labels<-c("pH","AK (mg/kg)","AP (mg/kg)","AN (mg/kg)","SOM (g/kg)","W (%)")layout(matrix(1:6,2,3,byrow=TRUE))par(mar=c(4.5,4.5,3,1))for(iinseq_along(x_vars)){fit<-lm(df$Shannon~df[[x_vars[i]]])s<-summary(fit)plot(df[[x_vars[i]]],df$Shannon,xlab=x_labels[i],ylab="Shannon",pch=16,cex=1.3,col="#00AFBB",cex.lab=1.5,cex.axis=1.2,las=1)abline(fit,lwd=3,col="#00AFBB")# Annotate R² and p-valuer2<-round(s$r.squared,3)p<-s$coefficients[2,4]p_label<-ifelse(p<0.001,"p<0.001",paste0("p=",round(p,3)))x_pos<-par("usr")[2]-diff(par("usr")[1:2])*0.25y_pos<-par("usr")[4]-diff(par("usr")[3:4])*0.1text(x_pos,y_pos,labels=bquote(R^2==.(r2)~" "~.(p_label)),cex=1.1,adj=c(1,1))}

Save at 300 dpi for publication:

png("scatter_6panel.png",width=3000,height=2000,res=300)# ... plotting code ...dev.off()

👁

Results at a Glance

Factor	R²	p-value	Significant?
pH	0.000	0.927	❌
AK	0.088	0.160	❌
AP	0.012	0.615	❌
AN	0.220	0.021	✅
OM	0.540	<0.001	✅✅
W	0.000	0.956	❌

Organic matter (OM) is the strongest predictor — it explains 54% of the variance in Shannon diversity.

Complete Script (Copy & Run)

# ================================================================# Scatter Plots + Linear Regression: Environmental Factors vs. Diversity# ================================================================plot_one<-function(x,y,xlab,ylab){fit<-lm(y~x)s<-summary(fit)r2<-round(s$r.squared,3)p<-s$coefficients[2,4]p_label<-ifelse(p<0.001,"p<0.001",paste0("p=",round(p,3)))plot(x,y,xlab=xlab,ylab=ylab,pch=16,cex=1.3,col="#00AFBB",cex.lab=1.5,cex.axis=1.2,las=1)abline(fit,lwd=3,col="#00AFBB")x_pos<-par("usr")[2]-diff(par("usr")[1:2])*0.25y_pos<-par("usr")[4]-diff(par("usr")[3:4])*0.1text(x_pos,y_pos,labels=bquote(R^2==.(r2)~" "~.(p_label)),cex=1.1,adj=c(1,1))}df<-read.csv("sandian.csv",header=TRUE)x_vars<-c("pH","AK","AP","AN","OM","W")x_labels<-c("pH","AK (mg/kg)","AP (mg/kg)","AN (mg/kg)","SOM (g/kg)","W (%)")png("scatter_6panel.png",width=3000,height=2000,res=300)layout(matrix(1:6,2,3,byrow=TRUE))par(mar=c(4.5,4.5,3,1))for(iinseq_along(x_vars)){plot_one(df[[x_vars[i]]],df$Shannon,x_labels[i],"Shannon")}dev.off()

To adapt for your data: change x_vars to your column names, x_labels to your axis labels, and df$Shannon to your diversity index.

Parameter Cheat Sheet

param	what it does	common values
`pch`	point shape	16 (filled), 1 (empty), 17 (triangle)
`cex`	point/text size	1.3–1.5 (moderate)
`col`	color	`#00AFBB` (teal), `#E69F00` (orange)
`lty`	line type	1 (solid), 2 (dashed)
`lwd`	line width	2–3 (publication)
`res`	export dpi	300 (journal)

When R² Is Not Significant

Normal. Don't only show significant results.
Try non-linear: poly(x, 2) for quadratic terms.
Small n (< 30) : may lack statistical power, not ecological meaning.

*Follow me on Dev.to for more R tutorials.

Tags: r tutorial datascience ecology

URL: https://dev.to/ecostatlab/scatter-plots-with-linear-regression-in-r-with-multi-panel-layouts-3c25

⇱ Scatter Plots with Linear Regression in R (with Multi-Panel Layouts) - DEV Community