Skip to contents

BarnebyLives relies on a variety of data to be downloaded locally to your computer to work. My instance, which covers a large area, essentially from Michigan to California, South through Texas to the Canadian border is around 16GiB. Many more regional collections are likely to be in the realm of 4-10 GiB, so relative to a normal R package A LOT of data. Relative to a typical geospatial workflow…. Essentially no data. Given how much larger the amount of disk data are on computers I think this amount is trivial to consider.

It’s important that you decide where BarnebyLives will be installed on your computer. While I have couple computers, I maintain only a single instance of BarnebyLives. It lives on one of my 3.5 terabyte hard drives which I have set up for various science projects.

I will be downloading and copying some of the data to that location, where I have a directory named ‘Barneby_Lives-dev’.

Helper functions

Unfortunately, setting up a BL instance is not totally automated by the package. The data come from a wide variety of sources, and there are two which I cannot for the life of me figure out how to download from. One is a very strange Minio server, if you are a minio expert please make an issue in the GH repo and let’s try and figure this out!

Taxonomic Data

Worldwide Checklist of Vascular Plants provides the local taxonomic information required to perform much of the name matching. We utilize it to ensure that queries sent to Plants of The World Online are spelled correctly, in order to reduce the number of queries we make to it. “The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity” serves as the basis for Kews Plants of the Worlds, this publication details the process behind the data set. This is not that large of a file, so it will be downloaded in a couple minutes.

This function will simply take the name of the output directory where we would like to save our data.

TaxUnpack(path = '/media/steppe/hdd/BL_sandbox/taxdata-MI', bound = bound)

The way that BarnebyLives matches spellings of names is very simple. It creates three tables, one large table which contains all of the species in the WFO database, one moderately small table which contains all infraspecific taxa (varieties and subspecies), and a small table of all of the genera.

However, both the genus and specific epithet tables are abbreviated. We find the results from these tables based on only the leading first two or three characters. This gives rises to one of BarnebyLives(!) limitations, the assumption that the first two characters of the genus, and first three letters of the specific epithet are spelled correctly.

Now we will also prepare the author abbreviations data set. These data come from IPNI, and were minimally processed on my end, see the script within the data-raw directory on github for details. We will save a copy of them in the taxonomy directory as well.

data('ipni_authors', package='BarnebyLives')
p2 <- '/media/steppe/hdd/BL_sandbox/taxdata-MI'
write.csv(ipni_authors, file.path(p2, 'ipni_author_abbreviations.csv'))

Download all geographic data and set up an instance

The package has one function, data_download which is able to download most of the data you need automatically. A second function tile_selector will help you determine which files you need to download manually.

# I am using R markdown so want to be really sure save data to the correct location
# so I will use an absolute path
data_download(path = '/media/steppe/hdd/BL_sandbox/raw_data') 
# if you are using an interactive R script. (file.R) you should be able to do something
# similar or even just cd into the directory where you want to save everything and 
# run the function ala `download-data()`

We will define a geographic extent that covers our area of interest. It will be a simple square, and grabbing coordinates from Open Street Maps, or a similar service, will suffice.

bound <- data.frame(
   y = c( 48,  48,  41,  41,  48),
   x = c(-91, -82, -82, -91, -91)
 ) # note that the 5th pair of coordinates is the same
# as the first pair (48, -91), this is required to 'close'
# the square. 

tileSelector(bound)

We will download a few attributes from Geomorpho90m from a Minio page. We need both ‘aspect’ (NOT ‘sine’ or ‘cosine’) and ‘geom’ from here, you should be able to control find these in their respective directories and manually download them quickly. Once the files are downloaded copy them to the same location as you had data_download save files.

Halfway there! Now we also need to manually download data on elevation from MERIT-DEM. Yes you technically need to register here, but you will automatically get a log in password to any email you enter, and it works! These tiles follow the same naming convention as the formed data set - in fact the previous data set is derived from this one. Also save these files to the location you have data_download save files.

Now that you have all of the data we can set up an instance, to do this all we need to do is run the following function. Specify where the downloaded data are ‘path’, where to save the date ‘pathOut’, the square (or bounding box) around your study area ‘bound’, and whether to remove the raw downloaded files ‘cleanup’. I always leave cleanup = FALSE, until I am 100% certain that everything is all good, which means running data through the pipeline and printing labels a couple times.

data_setup( # this one will take maybe 30 minutes or so - can't recall but a start
  # and walk away or do emails type event. 
  path = '/media/steppe/hdd/BL_sandbox/geodata_raw', 
  pathOut = '/media/steppe/hdd/BL_sandbox/geodata-MI',
  bound = bound, cleanup = FALSE
  )

Finishing up.

And that’s it! After a few runs of the system feel free to delete the raw data, you can do that by hand, or by initializing the instance using the ‘data_setup’ function; but that kind of seems like overkill!