This vignette is intended to give users a better idea regarding the
particulars of data organization in locaR
, especially when
using the localizeMultiple()
function.
As described in the other vignettes, using the
localizeMultiple()
function requires that data be organized
in a specific way - namely that recording sessions, or “surveys” must be
set up with a given folder structure. The folder structure is as
follows:
So, for a given project one might have on their computer a folder
pertaining to that project, and within that several different survey
folders, each corresponding to a different date and time (if you have
different surveys occurring at the same date and time, e.g. at different
sites, put those in separate folders). As described in the “Detecting
sound sources” vignette, there are several files that are read by
locaR
for localization. A few that warrant further
description are:
Each of these files is a .csv file. The use of csv files was aimed at making each file easily editable in other programs (e.g. Microsoft Excel).
The coordinates file is simply a spreadsheet of spatial coordinates for all microphones. Personally, I just have one big master spreadsheet containing the coordinates for all localization projects, with one row per microphone location. I have found that this works best, because I can always be confident that the master spreadsheet contains the most accurate versions of the coordinates (sometimes we have one set of coordinates taken with a handheld GPS, then another more accurate set taken by a survey-grade GPS). Having a master set of coordinates simplifies things, and also means that I only need one coordinates file for all localization projects.
Here’s what a coordinates file looks like:
head(read.csv(system.file('extdata', "Vignette_Coordinates.csv", package = 'locaR')))
#> Station Zone Easting Northing Elevation
#> 1 Ex-1 12 371296.0 5934638 738.0433
#> 2 Ex-2 12 371335.9 5934642 738.5402
#> 3 Ex-3 12 371373.9 5934642 738.5402
#> 4 Ex-4 12 371297.5 5934599 731.2959
#> 5 Ex-5 12 371339.0 5934599 733.7477
#> 6 Ex-6 12 371377.5 5934601 737.7347
There are five columns. Station is the unique name
for each microphone location. Zone is the UTM zone,
which is not used by any aspect of locaR
.
Easting, Northing, and
Elevation are the x-, y-, and z-coordinates,
respectively, of each microphone location. There can be other columns
(e.g. a “Comments” column, or “Latitude” and “Longitude” columns with
decimal degrees), but these will be ignored.
It is imperative that the Easting, Northing, and Elevation coordinates are measured in meters.
The channels file is constructed once per survey, and specifies which microphone to select from each recording unit.
head(read.csv(system.file('extdata', "Vignette_Channels.csv", package = 'locaR')))
#> Station Channel
#> 1 Ex-1 1
#> 2 Ex-2 1
#> 3 Ex-3 1
#> 4 Ex-4 1
#> 5 Ex-5 1
#> 6 Ex-6 1
It is a very simple, two-column spreadsheet, in which the first column is named Station, and the second is named Channel. The Station column contains location names that must match names in the coordinates file. The Channel column contains 1’s or 2’s specifying the channel to use for localization. Channel 1 is the left channel, Channel 2 is the right channel.
If working with stereo data, and wanting to select the right channel
for some units and the left for others, this can be specified in the
channels file by editing the desired rows. If working with mono data,
the Channels file is irrelevant. If so, simply don’t fill it in with
anything (a blank version is created by the setupSurvey()
function). That is what was done in the example provided in the “Intro
to localizeMultiple()” vignette.
Hopefully you will not need to create and use an adjustments file. It
is only needed if it is discovered that the file names do not accurately
reflect the start time in the file. I have had this happen with Wildlife
Acoustics recordings on occasion. In such cases, the file name tends to
suggest a certain start time, but the real start time was one second
later. My inspections tends to reveal that the file was otherwise
synchronized - it was just named wrong. Fortunately, the error tends to
be an error of exactly one second; by correcting for that one
second difference, the file becomes synchronized and can be used. It’s a
peculiar error that the Wildlife Acoustics employees told me is related
to a poorly calibrated GPS. Since I ran into this error frequently
enough, I added functionality to locaR
to deal with it -
this seemed preferable to identifying and re-naming files, since that
could cause issues (for example if I have backups of those same files,
the names will not match).
Although adjustment files are not used in any of the vignettes, here is an example taken from one of my own projects:
head(read.csv(system.file('extdata', "Vignette_Adjustments.csv", package = 'locaR')))
#> Filename Difference
#> 1 TDLO-001-261_0+1_20200612$090000.wav 1
#> 2 TDLO-001-260_0+1_20200612$090000.wav 1
#> 3 TDLO-001-262_0+1_20200612$090000.wav 1
#> 4 TDLO-001-263_0+1_20200612$090000.wav 1
#> 5 TDLO-001-190_0+1_20200612$090000.wav 1
The first column Filename gives the original file
name. The second column Difference indicates the amount
that the filename was incorrect, in seconds. Positive numbers indicate
the actual start time occurred after the start time indicated by the
file name, and negative numbers indicate the actual start time occurred
before that indicated by the file name. In the example data, all file
names were exactly 1 second off - all of these files actually started at
9:00:01, but their file names indicate they started at 9:00:00. When an
Adjustments file is provided to the setupSurvey()
function
and other subsequent functions, locaR
will automatically
add the appropriate amount of white noise to the beginning of the
recording (in this case 1 second) to bring the files into alignment
prior to localization/visualization.
Again, it’s a peculiar problem, and hopefully not one you ever need to deal with!
The detections file is where information about each sound of interest is entered.
head(read.csv(system.file('extdata', "Vignette_Detections_20200617_090000.csv", package = 'locaR')))
#> Station1 Station2 Station3 Station4 Station5 Station6 From To F_Low F_High
#> 1 Ex-8 Ex-5 Ex-6 Ex-9 Ex-4 NA 0.8 1.1 2000 6500
#> 2 Ex-8 Ex-5 Ex-6 Ex-9 Ex-4 NA 1.9 2.2 2000 6500
#> 3 Ex-8 Ex-5 Ex-6 Ex-9 Ex-4 NA 2.8 3.1 2000 6500
#> 4 Ex-8 Ex-5 Ex-6 Ex-9 Ex-4 NA 4.2 4.5 2000 6500
#> 5 Ex-8 Ex-5 Ex-6 Ex-9 Ex-4 NA 5.0 5.3 2000 6500
#> 6 Ex-8 Ex-5 Ex-6 Ex-9 Ex-4 NA 6.1 6.4 2000 6500
#> Species Individual Comments
#> 1 REVI 1
#> 2 REVI 1
#> 3 REVI 1
#> 4 REVI 1
#> 5 REVI 1
#> 6 REVI 1
Various pieces of information specific to each particular sound are
entered. The first six columns (Station1 to Station6)
include the Station (i.e. location) names. These must match
names provided in the coordinates file. If a column contains NA
(or a blank), it will be ignored. Currently, locaR
is
intended for using up to 6 microphones for localization; adding more is
unlikely to boost accuracy, and comes at a computational cost.
The From and To columns contain the start and end times of the sound of interest, in seconds relative to the beginning of the recording session. F_Low and F_High contain the low and high frequency of the sound of interest.
Those are the only columns that are actually used for localization. The Species, Individual and Comments columns are just for record-keeping sake. My preferred approach is to write down the species, the individual of that species starting at 1, and any comments about that sound (e.g. if it is overlapped, if it might be outside the array, etc.).
If a row has no information in any columns except the Comments column, that row will be ignored. This can be useful for record-keeping when a sound source is outside the array.
The settings file is the file that brings everything together in one place. It contains file paths to point towards the relevant data structures described above, as well as other relevant survey-specific information such as the temperature, assumed speed of sound, etc. An example:
read.csv(system.file('extdata', "Ex_20200617_090000_Settings.csv", package = 'locaR'), stringsAsFactors = F)
#> Setting Value
#> 1 DetectionsFile Ex_20200617_090000_Run1_Detections.csv
#> 2 CoordinatesFile Vignette_Coordinates.csv
#> 3 SiteWavsFolder
#> 4 AdjustmentsFile
#> 5 ChannelsFile Ex_20200617_090000_Channels.csv
#> 6 Date 20200617
#> 7 Time 90000
#> 8 tempC 15
#> 9 soundSpeed
#> 10 SurveyLength 7
#> 11 Margin 10
#> 12 Zmin -1
#> 13 Zmax 20
#> 14 Resolution 1
#> 15 Buffer 0.2
The first column contains the name of the setting, and the second column contains the value for that setting. The second column can be manually edited as desired, and this will affect the subsequent localization results.
The first six rows point towards the other files and folders. Date is an integer with 8 digits in the format YYYYMMDD. Time is a time with either 5 or 6 digits, either HHMMSS or HMMSS. tempC is the temperature in degrees celsius. soundSpeed is the speed of sound in meters per second. If soundSpeed is not defined, tempC will be used to define the speed of sound in air. If soundSpeed is defined, tempC is ignored. SurveyLength is the length of the survey in seconds (recordings can be longer than the desired period of time to be surveyed, e.g. if you want to survey the first minute of a recording session). Margin is the amount of space around the outside of the array to search for sound sources, in meters. Zmin and Zmax are the amount, in meters, to search below the lowest microphone and above the highest microphone, respectively. Resolution is the size of each grid cell, in meters along each side, in the search grid. Buffer is the amount of time, in seconds, to extract around each detection. This could be important when localizing very short sounds, because transmission delays could be longer than the duration of the sound, leading the sound to be missed on some microphones if the sound is not buffered.