Neo4j LLM Knowledge Graph Builder: An AI tool for building knowledge graphs from unstructured data

Machine Learning


https://github.com/neo4j-labs/llm-graph-builder

In the rapidly developing field of artificial intelligence, efficiently transforming unstructured data into organized, useful information is more important than ever. Recently, a team of researchers Neo4j LLM Knowledge Graph Builderis an AI tool that can easily address this issue. Its potential application is to create text-to-graph experiences by utilizing powerful machine learning models to transform unstructured text into extensive knowledge graphs.

A collection of powerful machine learning models, including OpenAI, Gemini, Llama3, Diffbot, Claude, and Qwen, are the foundation of the Neo4j LLM Knowledge Graph Builder. These models combine to handle a wide variety of material formats, including PDFs, papers, photos, web pages, and even YouTube video transcripts. The result is a complex entity network with nodes and their relationships, as well as a sophisticated vocabulary graph with chunks of text and embeddings, all stored in the Neo4j database.

One of the most important features of the Neo4j LLM Knowledge Graph Builder is its versatility in configuring the extraction schema. Users can specify the types of nodes and relationships to extract, ensuring that the generated knowledge graph meets their unique requirements. The program also provides post-extraction cleanup capabilities, improving the accuracy and relevance of the data.

The program works well for long English texts, but not for tabular data such as Excel or CSV files, presentations, images with charts, etc. Customers can achieve better quality data extraction by fine-tuning the graph structure to suit distinct characteristics of their data.

After building the knowledge graph, users can query the data using several Retrieval-Augmented Generation (RAG) techniques. Methods such as GraphRAG, Vector, and Text2Cypher enable advanced queries and perceptual data analysis, and also show how retrieved data can be used to provide relevant responses.

Neo4j LLM Knowledge Graph Builder is a highly adaptable application with a Python FastAPI backend and a React-based frontend. It works well on Google Cloud Run, but can also be deployed locally with Docker Compose. The application relies on the llm-graph-transformer module that Neo4j added to the LangChain framework to improve GraphRAG search capabilities and enable smooth integration with other LangChain modules.

Neo4j LLM Knowledge Graph Builder is easy to use and getting started is simple. Here are the steps:

  1. Launch the LLM Knowledge Graph Builder
  2. Get your credentials file, create a new AuraDB free database and link it to your instance of Neo4j (Aura).
  3. Upload a file from an S3/GCS bucket, document, PDF, or URL.
  4. Build and explore knowledge graphs and interact with your data using conversational questions with GraphRAG.

Uploading the sources, stored as document nodes in the graph, is the first step in the process. The text is split into easy-to-understand sections that are linked to their corresponding documents using LangChain Loaders. These pieces are then connected to each other according to their similarity to create a k-nearest neighbor (kNN) graph. These chunks contain embedding values ​​that are calculated and stored along with the vector index to enable efficient lookup.

The llm-graph-transformer or diffbot-graph-transformer modules are used to extract entities and relationships from the graph, and the extracted entities and relationships are linked back to the original graph chunks. This careful design ensures that the data is not only connected but also well-organized, enabling sophisticated RAG patterns and perceptual data analysis.

In conclusion, Neo4j LLM Knowledge Graph Builder is a major advancement in the data field. The program uses ML algorithms to transform unstructured data into actionable knowledge graphs, opening up new possibilities for enhanced data analysis and improved decision-making. For data scientists and analysts looking to extract maximum value from their data, Neo4j LLM Knowledge Graph Builder's smooth integration, adjustable extraction methods, and strong community support make it an essential tool.

Tanya Malhotra is a final year undergraduate student from the University of Petroleum and Energy Studies, Dehradun, doing a BTech in Computer Science Engineering with specialisation in Artificial Intelligence and Machine Learning.
She is an avid fan of Data Science and has strong analytical and critical thinking skills with a keen interest in learning new skills, group leadership and managing organized work.

🐝 Join the fastest growing AI research newsletter, read by researchers from Google + NVIDIA + Meta + Stanford + MIT + Microsoft & more…



Source link

Leave a Reply

Your email address will not be published. Required fields are marked *