top of page

R: Keep values below 99% quantile by group in data frame

Sometimes, especially in remote sensing, we deal with presence of outliers. Too distant beetle dispersal recorded as tree infestation from one year (over 3 km) to another may simply result from source tree located outside of the study area. Therefore, estimation of the population dispersal based on presence of infested trees might not correspond to true beetle dispersal.

Will the trends change after removal 0.5, 1 or 5% of dataset? To investigate this, we need firstly to identify the corresponding quantiles, and then keep only the remaining value. If the data are grouped by specific variable, this could be tricky. Let's look how to keep values below 99% quantile by group!

Example:

I would like to remove values above the 99% quantile by group.

# create data frame

df<-data.frame(group = rep(c("A", "B"), each = 3), value = c(c(6,5,80,4,80)*10,3))

group value 1 A 60 2 A 50 3 A 800 4 B 40 5 B 800 6 B 3

Get quantiles for individual groups

quant<-aggregate(df$value, by = list(df$group), FUN = quantile, probs = 0.99)

> quant Group.1 x

A 785.22

B 784.8

Select only the values lower than 99% of value by group

df[with(df, as.logical(ave(value, group, FUN= function(x) x <quantile(x, probs = 0.99)))), ]

Which results:

group value 1 A 60 2 A 50 4 B 40 6 B 3

Tags:

Featured Posts
Check back soon
Once posts are published, you’ll see them here.
Recent Posts
Search By Tags
Follow Us
  • Facebook Classic
  • Twitter Classic
  • Google Classic

© 2023 by WRITERS INC. Proudly created with Wix.com

  • facebook-square
  • Twitter Square
bottom of page