Submitted by TikkunCreation t3_110knl0 in MachineLearning

Here's my personal list of tools I think people will want to know about:

  • You'll probably want an LLM API
    • OpenAI
    • Cohere and others aren't as good
    • Anthropic's isn't available
  • If you're using embeddings
    • If you're working with a lot of items, you'll want a vector database, like Pinecone, or Weaviate, or pgvector
  • If you're building Q&A over a document
    • I'd suggest using GPT Index
  • If you need to be able to interact with external data sources, do google searches, database lookups, python REPL
    • I'd suggest using langchain
  • If you're doing chained prompts
    • Check out dust tt and langchain
  • If you want to deploy a little app quickly
    • Check out Streamlit
  • If you need to use something like stable diffusion or whisper in your product
    • banana dev, modal, replicate, tiyaro ai, beam cloud, inferrd, or pipeline ai
  • If you need something to optimize your prompts
    • Check out Humanloop and Everyprompt
  • If you're building models and need an ml framework
    • PyTorch, Keras, TensorFlow
  • If you're deploying models to production
    • Check out MLOps tools like MLflow, Kubeflow, Metaflow, Airflow, Seldon Core, TFServing
  • If you need to check out example projects for inspiration
    • Check out the pinecone op stack, the langchain gallery, the gpt index showcase, and the openai cookbook
  • If you want to browse the latest research, check out arXiv, of course

​

What am I missing?

107

Comments

You must log in or register to comment.

big_ol_tender t1_j89k4f7 wrote

Thanks for putting this together. I’d add deepsparse and sparsezoo for training/deploying sparse models. Also I can’t vouch for it because I haven’t used it (yet) but DVC (data version control) for ML Dev

12

0lecinator t1_j89y7np wrote

For research, paperswithcode and connectedpapers are fantastic

30

MiuraDude t1_j8aw11t wrote

Qdrant for the vector database and Kern AI refinery for data labeling!

2

thundergolfer t1_j8b2g6w wrote

> If you're deploying models to production

Airflow is not a good tool for ML development. Leave Airflow back in 2018. Also Modal can do prod model deployment, model pipelines, and inference.

3

thundergolfer t1_j8bjgpu wrote

If you don't have issues then definitely don't bother migrating! Something like Metaflow or Modal is much more built for purpose. Airflow was designed for the Hadoop era of data engineering; it's straining under changes that have happened in the Python, container, and ML ecosystems.

2

RunCodeCook t1_j8bjty1 wrote

Experiment tracking (weights and biases, mlflow, neptune, etc…)

Organizing research papers (zotero, paperpile, etc…)

9