Dataset Viewer

Dataset Card for LILA

Dataset Summary

LILA Camera Traps is an aggregate data set of images taken by camera traps, which are devices that automatically (e.g. via motion detection) capture images of wild animals to help ecological research.

This data set is the first time when disparate camera trap data sets have been aggregated into a single training environment with a single taxonomy.

This data set consists of only camera trap image data sets, whereas the broader LILA website also has other data sets related to biology and conservation, intended as a resource for both machine learning (ML) researchers and those that want to harness ML for this topic.

See below for information about each specific dataset that LILA contains:

Supported Tasks and Leaderboards

No leaderboards exist for LILA.

Languages

The LILA taxonomy is provided in English.

Dataset Structure

Data Instances

The data annotations are provided in COCO Camera Traps format.

All of the datasets share a common category taxonomy, which is defined on the LILA website.

Data Fields

Different datasets may have slightly varying fields, which include:

file_name: the file name
width and height: the dimensions of the image
study: which research study the image was collected as part of
location : the name of the location at which the image was taken
annotations: information about image annotation, which includes the taxonomy information, bounding box/boxes (bbox/bboxes) if any, as well as any other annotation information.
image : the path to download the image and any other information that is available, e.g. its size in bytes.

Data Splits

This dataset does not have a predefined train/test split.

Dataset Creation

Curation Rationale

The datasets that constitute LILA have been provided by the organizations, projects and researchers who collected them.

Source Data

Initial data collection and normalization

N/A

Who are the source language producers?

N/A

Annotations

Annotation process

Each dataset has been annotated by the members of the project/organization that provided it.

Who are the annotators?

The annotations have been provided by domain experts in fields such as biology and ecology.

Personal and Sensitive Information

Some of the original data sets included a “human” class label; for privacy reasons, these images were removed. Those labels are still present in the metadata. If those images are important to your work, contact the LILA maintainers, since in some cases it will be possible to release those images under an alternative license.

Considerations for Using the Data

Social Impact of Dataset

Machine learning depends on labeled data, but accessing such data in biology and conservation is a challenge. Consequently, everyone benefits when labeled data is made available. Biologists and conservation scientists benefit by having data to train on, and free hosting allows teams to multiply the impact of their data (we suggest listing this benefit in grant proposals that fund data collection). ML researchers benefit by having data to experiment with.

Discussion of Biases

These datasets do not represent global diversity, but are examples of local ecosystems and animals.

Other Known Limitations

N/A

Additional Information

Tutorial

The tutorial in this Google Colab notebook demonstrates how to work with this dataset, including filtering by species, collating configurations, and downloading images.

Working with Taxonomies

All the taxonomy categories are saved as ClassLabels, which can be converted to strings as needed. Strings can likewise be converted to integers as needed, to filter the dataset. In the example below we filter the "Caltech Camera Traps" dataset to find all the entries with a "felis catus" as the species for the first annotation.

dataset = load_dataset("society-ethics/lila_camera_traps", "Caltech Camera Traps", split="train")
taxonomy = dataset.features["annotations"].feature["taxonomy"]

# Filters to show only cats
cats = dataset.filter(lambda x: x["annotations"]["taxonomy"][0]["species"] == taxonomy["species"].str2int("felis catus"))

The original common names have been saved with their taxonomy mappings in this repository in common_names_to_tax.json. These can be used, for example, to map from a taxonomy combination to a common name to help make queries more legible. Note, however, that there is a small number of duplicate common names with different taxonomy values which you will need to disambiguate.

The following example loads the first "sea turtle" in the "Island Conservation Camera Traps" dataset.

LILA_COMMON_NAMES_TO_TAXONOMY = pd.read_json("https://huggingface.co/datasets/society-ethics/lila_camera_traps/raw/main/data/common_names_to_tax.json", lines=True).set_index("common_name")
dataset = load_dataset("society-ethics/lila_camera_traps", "Island Conservation Camera Traps", split="train")
taxonomy = dataset.features["annotations"].feature["taxonomy"]

sea_turtle = LILA_COMMON_NAMES_TO_TAXONOMY.loc["sea turtle"].to_dict()
sea_turtle = {k: taxonomy[k].str2int(v) if v is not None else v for k, v in sea_turtle.items()} # Map to ClassLabel integers

sea_turtle_dataset = ds.filter(lambda x: x["annotations"]["taxonomy"][0] == sea_turtle)

The example below selects a random item from the dataset, and then maps from the taxonomy to a common name:

LILA_COMMON_NAMES_TO_TAXONOMY = pd.read_json("https://huggingface.co/datasets/society-ethics/lila_camera_traps/raw/main/data/common_names_to_tax.json", lines=True).set_index("common_name")

dataset = load_dataset("society-ethics/lila_camera_traps", "Caltech Camera Traps", split="train")
taxonomy = dataset.features["annotations"].feature["taxonomy"]

random_entry = dataset.shuffle()[0]
filter_taxonomy = random_entry["annotations"]["taxonomy"][0]

filter_keys = list(map(lambda x: (x[0], taxonomy[x[0]].int2str(x[1])), filter(lambda x: x[1] is not None, list(filter_taxonomy.items()))))

if len(filter_keys) > 0:
 print(LILA_COMMON_NAMES_TO_TAXONOMY[np.logical_and.reduce([
 LILA_COMMON_NAMES_TO_TAXONOMY[k] == v for k,v in filter_keys
 ])])
else:
 print("No common name found for the item.")

Dataset Curators

LILA BC is maintained by a working group that includes representatives from Ecologize, Zooniverse, the Evolving AI Lab, Snapshot Safari, and Microsoft AI for Earth. Hosting on Microsoft Azure is provided by Microsoft AI for Earth.

Licensing Information

Many, but not all, LILA data sets were released under the Community Data License Agreement (permissive variant). Check the details of the specific dataset you are using in its section above.

Citation Information

Citations for each dataset (if they exist) are provided in its section above.

Contributions

Thanks to @NimaBoscarino for adding this dataset.

Downloads last month: 145

Paper for society-ethics/lila_camera_traps

Paper • 2202.02283 • Published Feb 4, 2022

URL: https://huggingface.co/datasets/society-ethics/lila_camera_traps

⇱ society-ethics/lila_camera_traps · Datasets at Hugging Face