Leveraging Biotic Interaction Knowledge Graph and Network Analysis to Uncover Insect Vectors of Plant Virus


February 28, 2024


Background: Insect vectors spread 80% of plant viruses, causing major agricultural production losses. Direct insect vector identification is difficult due to a wide range of hosts, limited detection methods, and high PCR costs and expertise. Currently, a biodiversity database named Global Biotic Interaction (GloBI) provides an opportunity to identify virus vectors using its data.

Objective: This study aims to build an insect vector search engine that can construct an virus-insect-plant interaction knowledge graph, identify insect vectors using network analysis, and extend knowledge about identified insect vectors.

Methods: We leverage GloBI data to construct a graph that shows the complex relationships between insects, viruses, and plants. We identify insect vectors using interaction analysis and taxonomy analysis, then combine them into a final score. In interaction analysis, we propose Targeted Node Centric-Degree Centrality (TNC-DC) which finds insects with many directly and indirectly connections to the virus. Finally, we integrate Wikidata, DBPedia, and NCBIOntology to provide comprehensive information about insect vectors in the knowledge extension stage.

Results: The interaction graph for each test virus was created. At the test stage, interaction and taxonomic analysis achieved 0.80 precision. TNC-DC succeeded in overcoming the failure of the original degree centrality which always got bees in the prediction results. During knowledge extension stage, we succeeded in finding the natural enemy of the Bemisia Tabaci (an insect vector of Pepper Yellow Leaf Curl Virus). Furthermore, an insect vector search engine is developed. The search engine provides network analysis insights, insect vector common names, photos, descriptions, natural enemies, other species, and relevant publications about the predicted insect vector.

Conclusion: An insect vector search engine correctly identified virus vectors using GloBI data, TNC-DC, and entity embedding. Average precision was 0.80 in precision tests. There is a note that some insects are best in the first-to-five order.


Keywords: Knowledge Graph, Network Analysis, Degree Centrality, Entity Embedding, Insect Vector