VOOZH about

URL: https://dev.to/ecostatlab/scatter-plots-with-linear-regression-in-r-with-multi-panel-layouts-3c25

⇱ Scatter Plots with Linear Regression in R (with Multi-Panel Layouts) - DEV Community


The Problem

You have environmental data (pH, organic matter, nutrients) and biodiversity data (Shannon index). You want to know: which environmental factors significantly affect diversity?

The quickest way to answer this is scatter plots + linear regression.

Here's a complete, runnable R workflow using real soil microbiome data — from a simple lm() call to a multi-panel publication-ready figure. Everything uses base R graphics, no ggplot2 required.

Data

👁

This dataset has 24 soil samples with 6 environmental factors and 2 diversity indices:

Variable Description Unit
pH Soil acidity
AK Available potassium mg/kg
AP Available phosphorus mg/kg
AN Available nitrogen mg/kg
OM Organic matter g/kg
W Water content %
Shannon Species diversity
PD Phylogenetic diversity
# Read and inspectdf<-read.csv("sandian.csv",header=TRUE)str(df)head(df)

Single Factor: pH → Shannon

x<-df$pHy<-df$Shannonfit<-lm(y~x)summary(fit)
Multiple R-squared: 0.0004, p-value: 0.927

R² = 0.0004, p = 0.927 — pH has no significant linear relationship with Shannon diversity in this dataset. That's OK. Non-significant results are normal in real research.

Plot it:

par(mar=c(5,5,3,1))plot(x,y,xlab="pH",ylab="Shannon Index",pch=16,cex=1.5,col="#00AFBB",cex.lab=1.8,cex.axis=1.3,las=1)abline(fit,lwd=3,col="#00AFBB")

👁

Multi-Panel: All 6 Factors at Once

Doing one plot at a time is inefficient. Use layout() + for loop:

x_vars<-c("pH","AK","AP","AN","OM","W")x_labels<-c("pH","AK (mg/kg)","AP (mg/kg)","AN (mg/kg)","SOM (g/kg)","W (%)")layout(matrix(1:6,2,3,byrow=TRUE))par(mar=c(4.5,4.5,3,1))for(iinseq_along(x_vars)){fit<-lm(df$Shannon~df[[x_vars[i]]])s<-summary(fit)plot(df[[x_vars[i]]],df$Shannon,xlab=x_labels[i],ylab="Shannon",pch=16,cex=1.3,col="#00AFBB",cex.lab=1.5,cex.axis=1.2,las=1)abline(fit,lwd=3,col="#00AFBB")# Annotate R² and p-valuer2<-round(s$r.squared,3)p<-s$coefficients[2,4]p_label<-ifelse(p<0.001,"p<0.001",paste0("p=",round(p,3)))x_pos<-par("usr")[2]-diff(par("usr")[1:2])*0.25y_pos<-par("usr")[4]-diff(par("usr")[3:4])*0.1text(x_pos,y_pos,labels=bquote(R^2==.(r2)~" "~.(p_label)),cex=1.1,adj=c(1,1))}

Save at 300 dpi for publication:

png("scatter_6panel.png",width=3000,height=2000,res=300)# ... plotting code ...dev.off()

👁

Results at a Glance

Factor p-value Significant?
pH 0.000 0.927
AK 0.088 0.160
AP 0.012 0.615
AN 0.220 0.021
OM 0.540 <0.001 ✅✅
W 0.000 0.956

Organic matter (OM) is the strongest predictor — it explains 54% of the variance in Shannon diversity.

Complete Script (Copy & Run)

# ================================================================# Scatter Plots + Linear Regression: Environmental Factors vs. Diversity# ================================================================plot_one<-function(x,y,xlab,ylab){fit<-lm(y~x)s<-summary(fit)r2<-round(s$r.squared,3)p<-s$coefficients[2,4]p_label<-ifelse(p<0.001,"p<0.001",paste0("p=",round(p,3)))plot(x,y,xlab=xlab,ylab=ylab,pch=16,cex=1.3,col="#00AFBB",cex.lab=1.5,cex.axis=1.2,las=1)abline(fit,lwd=3,col="#00AFBB")x_pos<-par("usr")[2]-diff(par("usr")[1:2])*0.25y_pos<-par("usr")[4]-diff(par("usr")[3:4])*0.1text(x_pos,y_pos,labels=bquote(R^2==.(r2)~" "~.(p_label)),cex=1.1,adj=c(1,1))}df<-read.csv("sandian.csv",header=TRUE)x_vars<-c("pH","AK","AP","AN","OM","W")x_labels<-c("pH","AK (mg/kg)","AP (mg/kg)","AN (mg/kg)","SOM (g/kg)","W (%)")png("scatter_6panel.png",width=3000,height=2000,res=300)layout(matrix(1:6,2,3,byrow=TRUE))par(mar=c(4.5,4.5,3,1))for(iinseq_along(x_vars)){plot_one(df[[x_vars[i]]],df$Shannon,x_labels[i],"Shannon")}dev.off()

To adapt for your data: change x_vars to your column names, x_labels to your axis labels, and df$Shannon to your diversity index.

Parameter Cheat Sheet

param what it does common values
pch point shape 16 (filled), 1 (empty), 17 (triangle)
cex point/text size 1.3–1.5 (moderate)
col color #00AFBB (teal), #E69F00 (orange)
lty line type 1 (solid), 2 (dashed)
lwd line width 2–3 (publication)
res export dpi 300 (journal)

When R² Is Not Significant

  • Normal. Don't only show significant results.
  • Try non-linear: poly(x, 2) for quadratic terms.
  • Small n (< 30) : may lack statistical power, not ecological meaning.

*Follow me on Dev.to for more R tutorials.

Tags: r tutorial datascience ecology