Skip to main content

Generative AI

NLP to SQL: Build Structured Data Questioning and Answering Application

Artificial Intelligence Concept Cpu Quantum Computing

Introduction to NLP to SQL

With the availability of powerful large language models, we now can convert natural language into SQL (NLP to SQL) with a single callout, enabling users to express their information needs naturally and efficiently.

Structured data, often residing in databases, requires precise SQL queries for retrieval. However, formulating these queries can be challenging for users unfamiliar with SQL. To bridge this gap, our application aims to interpret natural language questions and convert them into SQL queries.

Azure OpenAI

NLP to SQL is one of the advantages of the Azure OpenAI model that we can leverage for our Data Questioning and Answering application. There are different open-source models available for use, each with its own capabilities and limitations. We have used these same open-source models through API calls while developing the applications. It also involves some pricing and usage considerations.

There are factors that affect the response from these models. Below are the API parameters you can set to get different outputs for the same NLP:

  • Temperature
  • Top P
  • Frequency Penalty
  • Presence Penalty

Prompt

The more accurate the prompt, the better it will generate output. Users can ask anything that is not relevant to a particular database. In the prompt, you can include the expected output’s format to constrain the model to generate/handle NLP queries.

SQL is case-insensitive when it comes to keywords, such as SELECT, FROM, and WHERE. However, it is case-sensitive when it comes to identifiers such as table names, column names, and aliases. When converting natural language queries to SQL, developers need to be careful with the case sensitivity of these identifiers. Create prompt that handle the case sensitivity of table names.

Langchain

Langchain is a tool designed to facilitate natural language processing (NLP) tasks, particularly in the context of structured question-answering systems.

Langchain essentially acts as a facilitator, orchestrating interactions between natural language inputs, prompts, and chat models to enable structured question-answering systems. It provides a framework to streamline the development and deployment of such systems, particularly when dealing with structured data sources like databases.

Applications based on NLP to SQL

We have developed two solutions with different technology stacks.

Solution 1: In this solution we have used PostgreSQL, Azure OpenAI, HTML/JavaScript

https://bitbucket.org/prftdata/structureddataquesans_blog_solution_1/

Solution 2: In this solution we have used Streamlit, PostgreSQL, and LangChain’s natural language processing capabilities to generate SQL queries from user input.

https://bitbucket.org/prftdata/structured-ques-ans-using-langchain

You can modify the technology stack used to develop the above solutions, regardless of UI, LLM, or database. Additionally, you can leverage libraries such as pandas. By experimenting with different combinations of the technology stack, we can enhance the capabilities of the application.

Conclusion

The ability to refine this application further, optimize query translations, and integrate more complex SQL functionalities stands as promising future enhancements.

This project serves as a testament to the fusion of user-friendly interfaces with powerful data manipulation tools, paving the way for more intuitive data exploration and analysis.

Important Links

Thoughts on “NLP to SQL: Build Structured Data Questioning and Answering Application”

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Vishal Chaware

Vishal Chaware is Senior Technical Consultant at Perficient. He is a certified Marketing Cloud Developer with experience in Marketing Cloud API.

More from this Author

Categories
Follow Us