From Text to Metadata: Automated Product Tagging with Python and Natural Language Processing

July, 2024
IDA document: 3002727
FFRDC: Systems and Analyses Center
Type: Documents
Division: Information Technology and Systems Division
Authors:
Authors
Aayushi Verma, Omar A. Khan See more authors
IDA produces research deliverables in various formats for our sponsors. Summarizing these products quickly for efficient information-retrieval on given research topics poses a challenge. IDA has led numerous initiatives for tagging products with IDA-defined taxonomies of research terms, but this is a manual and time-consuming process and must be repeated periodically to tag newer products. To address this challenge, we have developed a Python-based automated tagging pipeline. In this article, we discuss the pipeline’s mechanics, current results and future applications. This article was published in Volume 45, Issue 3 of The ITEA Journal of Test and Evaluation.