|
Title:
|
TOWARDS USING VISION LANGUAGE MODELS FOR URBAN TREE ANALYSIS |
|
Author(s):
|
Danilo Jodas, Giovani Candido, João Manesco, Gabriel Garcia, Luiz Marques Junior and João Papa |
|
ISBN:
|
978-989-8704-71-9 |
|
Editors:
|
Paula Miranda and Pedro Isaías |
|
Year:
|
2025 |
|
Edition:
|
Single |
|
Keywords:
|
Generative Models, Large Language Models, Urban Tree, Bode, GemBode |
|
Type:
|
Full Paper |
|
First Page:
|
123 |
|
Last Page:
|
130 |
|
Language:
|
English |
|
Cover:
|
|
|
Full Contents:
|
if you are a member please login
|
|
Paper Abstract:
|
In a breakthrough era of sustainable cities and carbon dioxide emission reduction, the efficient and rapid collection of crucial data on urban trees has raised the attention of municipalities and forestry managers. The standard approach for collecting the main aspects of urban trees involves fieldwork campaigns to obtain dendrometric tree information, including height and tree species, which allows for an initial assessment and further recording for catalog purposes. However, conducting field analyses is labor-intensive and time-consuming due to trees widely dispersed across urban areas. Thus, there is a significant need for computational analysis strategies to facilitate quick assessments and records of crucial urban tree information. This paper introduces an approach based on a vision language model to summarize the main aspects of the tree using an image from the street-view perspective. In addition, a new question-and-answer dataset with summaries of tree information has been created based on a 7B Portuguese language model. The dataset is subsequently used to fine-tune the PaliGemma vision language model using images of urban trees. The results demonstrated a compelling capability of the proposed approach to accurately summarize key tree attributes, such as the tree height and its species, using language models with significantly fewer parameters. |
|
|
|
|
|
|