Submitted by michaelthwan_ai t3_11vi82q in MachineLearning
michaelthwan_ai OP t1_jct4sdj wrote
Demo page: https://searchgpt-demo.herokuapp.com/
Github : https://github.com/michaelthwan/searchGPT
searchGPT is a search engine or question-answer bot based on LLM to give natural language answers. You may see the footnote which is the reference of sources from the web. Below there is a explainability view to show how the response is related to the sources.
Why Grounded though?
Because it is impossible for the LLM to learn everything during the training, thus real-time factual information is needed for reference.
This project tried to reproduce work like Bing and perplexity AI which have external references to support the answer of LLM.
Some examples of good grounded answer from searchGPT and wrong ungrounded answer from ChatGPT is mentioned in the github.
rowleboat t1_jctpu8c wrote
Can this use a SQL database as an external reference?
Tostino t1_jctq5az wrote
Look into llama-index
michaelthwan_ai OP t1_jctv2tm wrote
Thank you.
Due to people close to me and my googling, my choices of indexer is like this
pyterrier -> faiss -> native embedding
Then I found llama-index, but it currently won't give extra values to me so I didn't adopt.
I have stories on pros/cons on those lib...
michaelthwan_ai OP t1_jctvcas wrote
Theoretically yes but in exact the objective you want to do is crucial.
SQL database don't support similarity/elastic search, which is very useful in natural language. It may limit what you can do or make your product less good.
Secret-Fox-5238 t1_jcv5dhh wrote
This is completely false. Elastic was invented by SQL. You use things like “LIKE” and a few other choice keywords. Just google them or go to Microsoft directly and look at sql select statements. You can string together CTE’s which immediately gives you elasticity. So, sorry, but this is a nonsensical response
michaelthwan_ai OP t1_jcws6h8 wrote
ChatGPT said what I want to say.
>I apologize for any confusion or misinformation in my previous response. You are correct that SQL databases do support various text search and similarity matching features, including the use of keywords like LIKE and CTE (Common Table Expressions) to enable more flexible and efficient querying.
>
>While it's true that specialized tools like Elasticsearch, Solr, or Algolia may offer additional features and performance benefits for certain natural language processing tasks, SQL databases can still be a powerful and effective tool for storing and querying structured and unstructured data, including text data.
>
>Thank you for bringing this to my attention and allowing me to clarify my previous response.
KingsmanVince t1_jctpr5l wrote
Not sure this is frontend problem or not, but the python code is printed without identation.
michaelthwan_ai OP t1_jcturwz wrote
I believe it is a frontend problem. We are not frontend developers thus but we think that Gradio is too plain to show the result, thus we built a minimal UI.
That markdown (``` <code> ```) is currently not supported to pretty print like ChatGPT one.
phazei t1_jcvcn08 wrote
If you can have it add a class and add "white-space: pre" to the css, it should probably fix it if it's just a frontend issue.
[deleted] t1_jdgvy9w wrote
[removed]
Viewing a single comment thread. View all comments