Today, we are excited to announce the general availability of Databricks Assistant Autocomplete on all cloud platforms. Assistant Autocomplete provides personalized AI-powered code suggestions as-you-type for both Python and SQL.
Directly integrated into the notebook, SQL editor, and AI/BI Dashboards, Assistant Autocomplete suggestions blend seamlessly into your development flow allowing you to stay focused on your current task.
We are excited to bring these productivity improvements to everyone. Over the coming weeks, we'll be enabling Databricks Assistant Autocomplete across eligible workspaces.
Compound AI refers to AI systems that combine multiple interacting components to tackle complex tasks, rather than relying on a single monolithic model. These systems integrate various AI models, tools, and processing steps to form a holistic workflow that is more flexible, performant, and adaptable than traditional single-model approaches.
Assistant Autocomplete is a compound AI system that intelligently leverages context from related code cells, relevant queries and notebooks using similar tables, Unity Catalog metadata, and DataFrame variables to generate accurate and context-aware suggestions as you type.
Our Applied AI team utilized Databricks and Mosaic AI frameworks to fine-tune, evaluate, and serve the model, targeting accurate domain-specific suggestions.
Consider a scenario where you've created a simple metrics table with the following columns:
Assistant Autocomplete makes it easy to compute the click-through rate (CTR) without needing to manually recall the structure of your table. The system uses retrieval-augmented generation (RAG) to provide contextual information on the table(s) you're working with, such as its column definitions and recent query patterns.
For example, with table metadata, a simple query like this would be suggested:
If you've previously computed click rate using a percentage, the model may suggest the following:
Using RAG for additional context keeps responses grounded and helps prevent model hallucinations.
Let's analyze the same table using PySpark instead of SQL. By utilizing runtime variables, it detects the schema of the DataFrame and knows which columns are available.
For example, you may want to compute the average click count per day:
In this case, the system uses the runtime schema to offer suggestions tailored to the DataFrame.
While many code completion LLMs excel at general coding tasks, we specifically fine-tuned the model for the Databricks ecosystem. This involved continued pre-training of the model on publicly available notebook/SQL code to focus on common patterns in data engineering, analytics, and AI workflows. By doing so, we've created a model that understands the nuances of working with big data in a distributed environment.
To ensure the quality and relevance of our suggestions, we evaluate the model using a suite of commonly used coding benchmarks such as HumanEval, DS-1000, and Spider. However, while these benchmarks are useful in assessing general coding abilities and some domain knowledge, they don't capture all the Databricks capabilities and syntax. To address this, we developed a custom benchmark with hundreds of test cases covering some of the most commonly used packages and languages in Databricks. This evaluation framework goes beyond general coding metrics to assess performance on Databricks-specific tasks as well as other quality issues that we encountered while using the product.
If you are interested in learning more about how we evaluate the model, check out our recent post on evaluating LLMs for specialized coding tasks.
There are often cases when the context is sufficient as is, making it unnecessary to provide a code suggestion. As shown in the following examples from an earlier version of our coding model, when the queries are already complete, any additional completions generated by the model could be unhelpful or distracting.
In all of the examples above, the ideal response is actually an empty string. While the model would sometimes generate an empty string, cases like the ones above were common enough to be a nuisance. The problem here is that the model should know when to abstain - that is, produce no output and return an empty completion.
To achieve this, we introduced a fine-tuning trick, where we forced 5-10% of the cases to consist of an empty middle span at a random location in the code. The thinking was that this would teach the model to recognize when the code is complete and a suggestion isn't necessary. This approach proved to be highly effective. For the SQL empty response test cases, the pass rate went from 60% up to 97% without impacting the other coding benchmark performance. More importantly, once we deployed the model to production, there was a clear step increase in code suggestion acceptance rate. This fine-tuning enhancement directly translated into noticeable quality gains for users.
Given the real-time nature of code completion, efficient model serving is crucial. We leveraged Databricks' optimized GPU-accelerated model serving endpoints to achieve low-latency inferences while controlling the GPU usage cost. This setup allows us to deliver suggestions quickly, ensuring a smooth and responsive coding experience.
As a data and AI company focused on helping enterprise customers extract value from their data to solve the world's toughest problems, we firmly believe that both the companies developing the technology and the companies and organizations using it need to act responsibly in how AI is deployed.
We designed Assistant Autocomplete from day one to meet the demands of enterprise workloads. Assistant Autocomplete respects Unity Catalog governance and meets compliance standards for certain highly regulated industries. Assistant Autocomplete respects Geo restrictions and can be used in workspaces that deal with processing Protected Health Information (PHI) data. Your data is never shared across customers and is never used to train models. For more detailed information, see Databricks Trust and Safety.
Databricks Assistant Autocomplete is available across all clouds at no additional cost and will be enabled in workspaces in the coming weeks. User's can enable or disable the feature in developer settings: