
Preparing to use BarnebyLives!
setting_up_files.Rmd
BarnebyLives relies on a variety of data to be downloaded locally to your computer to work. My instance, which covers a large area, essentially from Michigan to California, South through Texas to the Canadian border is around 16GiB. Many more regional collections are likely to be in the realm of 4-10 GiB, so relative to a normal R package A LOT of data. Relative to a typical geospatial workflow…. Essentially no data. Given how much larger the amount of disk data are on computers I think this amount is trivial to consider.
It’s important that you decide where BarnebyLives will be installed on your computer. While I have couple computers, I maintain only a single instance of BarnebyLives. It lives on one of my 3.5 terabyte hard drives which I have set up for various science projects.
I will be downloading and copying some of the data to that location, where I have a directory named ‘Barneby_Lives-dev’.
Helper functions
Unfortunately, setting up a BL instance is not totally automated by the package. The data come from a wide variety of sources, and there are two which I cannot for the life of me figure out how to download from. One is a very strange Minio server, if you are a minio expert please make an issue in the GH repo and let’s try and figure this out!
Taxonomic Data
Worldwide Checklist of Vascular Plants provides the local taxonomic information required to perform much of the name matching. We utilize it to ensure that queries sent to Plants of The World Online are spelled correctly, in order to reduce the number of queries we make to it. “The World Checklist of Vascular Plants, a continuously updated resource for exploring global plant diversity” serves as the basis for Kews Plants of the Worlds, this publication details the process behind the data set. This is not that large of a file, so it will be downloaded in a couple minutes.
This function will simply take the name of the output directory where we would like to save our data.
TaxUnpack(path = '/media/steppe/hdd/BL_sandbox/taxdata-MI', bound = bound)
The way that BarnebyLives matches spellings of names is very simple. It creates three tables, one large table which contains all of the species in the WFO database, one moderately small table which contains all infraspecific taxa (varieties and subspecies), and a small table of all of the genera.
However, both the genus and specific epithet tables are abbreviated. We find the results from these tables based on only the leading first two or three characters. This gives rises to one of BarnebyLives(!) limitations, the assumption that the first two characters of the genus, and first three letters of the specific epithet are spelled correctly.
Now we will also prepare the author abbreviations data set. These data come from IPNI, and were minimally processed on my end, see the script within the data-raw directory on github for details. We will save a copy of them in the taxonomy directory as well.
Download all geographic data and set up an instance
The package has one function, data_download
which is
able to download most of the data you need automatically. A
second function tile_selector
will help you determine which
files you need to download manually.
# I am using R markdown so want to be really sure save data to the correct location
# so I will use an absolute path
data_download(path = '/media/steppe/hdd/BL_sandbox/raw_data')
# if you are using an interactive R script. (file.R) you should be able to do something
# similar or even just cd into the directory where you want to save everything and
# run the function ala `download-data()`
We will define a geographic extent that covers our area of interest. It will be a simple square, and grabbing coordinates from Open Street Maps, or a similar service, will suffice.
bound <- data.frame(
y = c( 48, 48, 41, 41, 48),
x = c(-91, -82, -82, -91, -91)
) # note that the 5th pair of coordinates is the same
# as the first pair (48, -91), this is required to 'close'
# the square.
tileSelector(bound)
We will download a few attributes from Geomorpho90m from a Minio
page. We need both ‘aspect’ (NOT ‘sine’ or ‘cosine’) and ‘geom’ from
here, you should be able to control find these in their respective
directories and manually download them quickly. Once the files are
downloaded copy them to the same location as you had
data_download
save files.
Halfway there! Now we also need to manually download data on
elevation from MERIT-DEM.
Yes you technically need to register here, but you will automatically
get a log in password to any email you enter, and it works! These tiles
follow the same naming convention as the formed data set - in fact the
previous data set is derived from this one. Also save these files to the
location you have data_download
save files.
Now that you have all of the data we can set up an instance, to do this all we need to do is run the following function. Specify where the downloaded data are ‘path’, where to save the date ‘pathOut’, the square (or bounding box) around your study area ‘bound’, and whether to remove the raw downloaded files ‘cleanup’. I always leave cleanup = FALSE, until I am 100% certain that everything is all good, which means running data through the pipeline and printing labels a couple times.
data_setup( # this one will take maybe 30 minutes or so - can't recall but a start
# and walk away or do emails type event.
path = '/media/steppe/hdd/BL_sandbox/geodata_raw',
pathOut = '/media/steppe/hdd/BL_sandbox/geodata-MI',
bound = bound, cleanup = FALSE
)