How to Create LLM with Hubspot Data: A Complete Guide

December 4, 2024
20 min read

While running a business, you need to employ CRM applications to streamline customer services, marketing, and sales operations. One of the most popular CRM platforms is HubSpot, which enables you to manage your business through its user-friendly features at an affordable cost.

However, to improve your business workflow further, you can create LLMs with HubSpot data. This will automate most of your customer management tasks, allowing you to focus on more productive business aspects.

Understanding the Foundation

HubSpot is an AI-powered customer relationship management (CRM) platform that offers software and resources for managing marketing, sales, and customer services. You can utilize them to automate your business operations, increasing productivity and profits. The services offered by HubSpot are known as Hubs, and there are six prominent Hubs—Marketing Hub, Sales Hub, Service Hub, Content Hub, Operations Hub, and Commerce Hub.

To further enhance the benefits of HubSpot, you can create LLM with HubSpot data:

HubSpot Data Structure Overview

HubSpot data structure or data model is a framework that enables you to organize CRM data within the platform. The key components of the HubSpot database are:

  • Objects: The objects represent various relationships or processes that are present within a business. Every HubSpot account consists of four standard objects: contacts, companies, deals, and tickets.
  • Records: A single instance of an object is called a record, where you can store information of properties. You can associate related records within different objects to understand their associations.
  • Properties: Properties are different fields that allow you to store information within a record. For each object, there are different properties included in your account, but you can also create your own custom properties.
Hubspot Data Structure

For example, James Smith is a contact record, and jamesmith@email.com is an Email contact property. James’s company Paper.Inc., is a company record and is associated with James Smith contact record. If James interacts with your sales and support team, you can create and link deals and tickets with both James and his company.

LLM Capabilities for CRM Data

LLMs trained on CRM data should have some specific capabilities to help you enhance your business growth. First is contextual understanding to interpret customer data such as purchase history or engagement patterns to generate accurate results. The second is predictive analytics, through which LLM should be able to predict customer behavior in advance. This helps in understanding churn rates and also facilitates lead generation, resulting in increased sales.

Lastly, the LLM created with HubSpot data should resolve customer queries 24/7 in real-time.

Use Cases and Business Value

LLMs help you automate routine tasks such as extraction of information from emails, contracts, or notes, reducing manual efforts in processes such as data entry and lead generation. They assist you in optimizing sales by analyzing high-volume business data to understand customer preferences, purchase history, or industry trends.

Additionally, with LLMs, you can improve customer communication by generating tailored responses for emails or chats.

Privacy and Compliance Basics

To ensure data privacy while creating LLM with HubSpot data, you should adopt some security measures. Encrypt the CRM data at rest and in transit using security protocols to protect sensitive customer data and ensure compliance with data regulatory frameworks like GDPR or HIPAA.

You can further implement the RBAC mechanism to control who can or cannot view data within LLM. For this, you can utilize the access control features provided by foundational models such as OpenAI’s GPT or Meta’s Llama. You can also implement your own authentication mechanism at the organizational level.

Resource Requirements

To create LLM with HubSpot data, arrange all the required resources, such as:

  • Data management systems or data warehouses.
  • Technical infrastructure, including high-performance GPUs/TPUs.
  • LLM frameworks like Langchain, Llamaindex, or Huggingface.
  • Data integration and transformation tools for streamlining training datasets.

Environment Setup

To create LLM with HubSpot, you need a suitable environment, which can achieved by following the below steps:

HubSpot API Authentication

To extract CRM data from HubSpot, you can connect your target data system with HubSpot through API using the following two authentication methods:

OAuth

OAuth is suggested if you want multiple customers to use your LLM or list it on the HubSpot App Marketplace. You need to include the OAuth access token in the authorization header to make a request using OAuth.


/~curl --header "Authorization: Bearer C4d***sVq"
https://api.hubapi.com/crm/v3/objects/contacts?limit=10&archived=false

Here, Bearer C4d***sVq is the access token for the HubSpot account.

Private App Access Tokens

Opt for this if the LLM application that you are creating will be used internally only by your organization. To make a request using the private app access token, you can include the token in the authorization header.


/~curl --header "Authorization: Bearer ***-***-*********-****-****-****-************"
https://api.hubapi.com/crm/v3/objects/contacts?limit=10&archived=false

To start exporting data from your HubSpot account, you can make a POST request to /crm/v3/exports/export/async. Specify the file format, objects, and properties that you want to export.

Development Environment

Set up the programming language environment where you want to query and train the HubSpot data. Ensure the availability of libraries like PyTorch, TensorFlow, Huggingface, and essential SDKs.

Required Permissions

Log in to your HubSpot account and configure permissions for various CRM objects and activities. These include contacts, companies, deals, tickets, CRM emails, and calls.

Testing Setup

Build a Standard Sandbox Environment

A Sandbox is an isolated environment where you can test the functionality of your code or applications without affecting the main system. In HubSpot, Super Admins can create a standard sandbox account. A Super Admin user has access to all the HubSpot tools and settings. Once the sandbox is created, you can sync specific data from the main account to test new integrations. This approach ensures that tests are conducted independently without impacting the data in the main account.

Security Configurations

For safe interaction between HubSpot and other data systems, you may perform API security testing. This involves conducting tests such as user authentication, parameter tampering, injection, unhandled HTTP, or fuzz tests.

HubSpot Data Architecture

HubSpot Data Architecture

The data architecture of HubSpot consists of the following components:

Contact Records

The contacts consist of information about individuals who interact with your business. You can use the contacts endpoints to create and manage contact records in your HubSpot account and sync data between HubSpot and other systems.

Company Information

The company information data includes details such as name, address, establishment year, or industry category. HubSpot allows you to gather this data through web crawling or crowd-sourcing and store it on a database called HubSpot Insights.

Deal Pipelines

Deal pipelines enable you to visualize your sales process to track constraints in the selling process and get an estimate of revenue. The pipelines consist of the steps that indicate the movement of a sales opportunity toward closing.

Marketing Data

Ads, campaigns, emails, social media posts, and SMS are the major components of HubSpot marketing data. You can analyze metrics obtained from marketing data to understand the performance of your campaign and areas of improvement.

Service Tickets

Using service tickets, you can organize all the customer inquiries in a centralized way. To create tickets, you can utilize the ticket index page, contact records, or conversation box in your HubSpot account.

Custom Objects

Contacts, companies, deals, and tickets are standard objects. If your business requires some processes or attributes other than these objects, then HubSpot allows you to create custom objects. To create custom objects, you need to subscribe to the Enterprise version of the HubSpot account. Here, you can define custom objects in custom object settings or through API.

Data Collection Strategy

To create LLM with HubSpot data, you need to first collect and load it to a unified location. The target data system may include cloud data storage systems, data lakes, data warehouses, or vector databases. Some of the strategies through which you can collect data are as follows:

API Endpoints Utilization

Each object in HubSpot has its own specific API endpoints. These endpoints allow you to extract different types of data, such as contact records, marketing data, company information, or service tickets. You can also utilize the pagination technique supported by HubSpot API for faster data extraction. It is the process of breaking large datasets into smaller chunks or pages to retrieve data quickly.

Webhook Implementation

Webhook Implementation

Webhook is an HTTP request that gets triggered by an event, allowing you to transfer data between various sources and target data systems. You can use Webhooks in HubSpot to gather the CRM data by subscribing to the events happening in the HubSpot account. For example, the updation or addition of a contact in HubSpot.

Incremental Sync

Using the incremental sync method, you can reduce the resources and time required for data collection. This process allows you to only sync the data that has changed since the last sync, ensuring data consistency.

Batch Processing

Batch processing facilitates periodic collection of data in batches and is extremely useful for collecting large amounts of data. LLMs are trained on high-volume datasets, so batch processing can be a useful technique for extracting large-scale HubSpot data in batches.

For effective batch processing, you can opt to use a data movement platform like Airbyte. It offers a vast library of 400+ pre-built connectors, allowing you to collect data in batches from HubSpot and transfer it to any desired destination data system.

Airbyte

Some additional key features of Airbyte are:

  • AI-powered Connector Creation: You can create custom connectors using Connector Builder with the AI assistant feature. The AI assistant automatically prefills the required configuration fields and provides intelligent solutions to fine-tune the connector building process.
  • Schema Management: With Airbyte, you can mention how it should manage source schema changes to ensure accurate data sync. The Airbyte Cloud service automatically checks for schema changes in the source every 15 minutes. On the contrary, in the Self-hosted version, schema checks are performed after every 24 hours.

Historical Data Handling

You must utilize historical CRM data from HubSpot while creating LLMs. The historical data consists of all the past information related to contacts, companies, deals, and campaigns. This information aids LLM in better pattern recognition and predictive analytics.

Rate Limit Management

The rate limit is a constraint posed by API to restrict the number of times you can access any API. Most LLM providers allow you to set rate limits according to your requirements. For HubSpot, there are different rate limits for accessing APIs depending upon the type of your subscription.

Data Preparation

After collecting data, you need to prepare it to avoid data biases and hallucinations in LLM output. Some of the steps you can take for this are as follows:

Field Selection

Field selection involves choosing suitable columns, attributes, or data points from the dataset. This ensures that the training data aligns with the objectives that you want to achieve using LLM. For example, if you want to use the LLM for sales support, you should choose fields such as customer demographics, deal stages, or product categories.

Data Cleaning

This involves identifying duplicate, missing, and outlier data records and removing them to make the training dataset consistent. Such a cleaned dataset enables you to train LLM, which produces high-quality responses.

Property Mapping

Some of HubSpot's properties include text, number, date picker, datetime, single checkbox, or drop-down list. To ensure proper data sync, you can map these properties to the target data system’s fields.

Relationship Handling

Many times, different objects in the HubSpot data are related to each other. To manage relationships between objects, you can associate their records. Handling these relationships effectively facilitates building an LLM that generates more contextually accurate results.

Timeline Events

You can use the CRM extensions in HubSpot to add information from other systems to HubSpot objects. This is done by creating custom timeline events using timeline events API endpoints. Timeline events can help you add extra information to your HubSpot data that can be essential for optimized LLM outcomes.

Activity Logs

Activity logs are the records of activities that a customer performs, including login frequency, messages sent, survey responses, or tickets raised. During the preparation process, you can clean and transform this data to convert it into a format suitable for LLM applications.

LLM Integration Framework with Airbyte

When using Airbyte, you can integrate it with LLM frameworks like LangChain or LlamaIndex and perform RAG techniques such as chunking or indexing. These techniques enable you to optimize LLM results.

Airbyte also allows you to directly load semi-structured and unstructured data to eight different vector databases. This includes vector stores like Pinecone, Weaviate, Milvus, and Qdrant. You can integrate these vector databases with pre-built LLM providers to generate vector embeddings that aid in semantic and contextual search and retrieval operations.

The crucial processes involved in choosing the right LLM frameworks are as follows:

Model Selection

To select a suitable model for building LLM, you first need to identify the usage objectives of your LLM. Then, finalize other factors such as model size, data availability, and computational resources. By pre-defining the LLM purpose and evaluating existing infrastructure, you can narrow down your choices and select a specific model suitable for building your LLM. Some prominent models used to build LLMs are ChatGPT-4, Llama, and BERT.

Context Engineering

Context engineering involves clearly defining the context in the input data to help LLM generate accurate results. A good contextual understanding enables LLM to give relevant results that align with your business objectives.

Prompt Design

Prompt designing involves crafting clear and concise instructions to be given as input to LLM. There are different types of prompts, such as zero-shot, one-shot, multi-shot, or chain-of-thought prompts. You can use any of these to obtain the best possible results.

Response Templates

The response template is the framework defining the structure in which the LLM will generate output. It includes text type, formatting rules, and specialized characters that will be used to frame the output content. A well-defined response template facilitates consistency and better readability for the end-user.

Token Optimization

LLM tokenization is the process of breaking down large texts into smaller units called tokens. Token optimization involves reducing the number of tokens in prompts to enhance performance and reduce the usage and cost of resources required to operate LLM.

Error Handling

Error handling is essential to ensure proper interaction and output generation in LLMs. Inaccuracies usually arise due to incorrect prompting, biased training data, and improper or absence of feedback.

To resolve these issues, 

  • You should gather data from reliable sources and cleanse it properly to avoid errors in the outputs.
  • Learn the correct prompting techniques to get correct results quickly.
  • Implement a robust feedback mechanism to further improve LLM operations.

Core Features Development

Feature development is the procedure in which you assign the functionalities to your LLM. When you create LLM with HubSpot data, it should be capable of performing the following functions:

Contact Analysis

Your LLM should perform contact analysis by evaluating the source, creation date, or other activities of any contact record. Such information helps you to identify contacts that are of high priority and those that are inactive.

Deal Intelligence

Deal Intelligence involves utilizing CRM data insights to track and close a sales deal. The LLM created with HubSpot data should provide information to you or your sales team on how to optimize sales.

Email Content Generation

You can use the HubSpot data-trained LLM to produce email content for your email marketing campaigns. The LLM can help you draft more personalized emails based on customers’ preferences and purchasing history.

Meeting Summaries

The LLM created using HubSpot data should summarize and disseminate the discussion taking place in a business meeting. You can utilize it for workflow automation, saving your employees’ time.

Customer Insights

The data stored in HubSpot consists of activity logs, customer feedback, sales lifecycle stages, and deal information. LLM, created with such data, can analyze and produce insights on customer interactions, behavior, and purchasing habits.

Engagement Scoring

Engagement score reflects how customers and clients engage with your products and services. It is calculated using data on customer interaction, purchase frequency, and feedback. Your marketing and sales team can use the engagement score to assess and improve their work performance.

Common Business Applications

Some common business applications of LLM created with HubSpot data include:

Lead Qualification

LLMs trained with HubSpot data can significantly enhance lead generation. By analyzing customer activity patterns, engagement scores, and sales lifecycle stages, these models identify potential areas for improvement. Additionally, LLM can support personalized marketing strategies to draw the attention of potential customers and ultimately convert them into prospective consumers.

Customer Service

You can use LLM to examine the service tickets, emails, and messages stored in HubSpot. The model can then automatically generate relevant responses for each query, improving customer communication. If the issue is complex, it can direct customers to human customer support, reducing the resolution time.

Deployment Process

After completing the development process, you need to finally deploy the LLM. Here are the prominent points that you should consider during the LLM deployment:

Environment Staging

First, you should set up the environment for implementing your LLM by installing all the necessary software, vector databases, memory, and storage resources. Through environment staging, you can ensure the testing of your LLM in a controlled ecosystem.

Version Control

Set up a version control mechanism to track changes made to your LLM weights, data, or source code. This helps manage the evolution of the model and restore previous versions if needed.

CI/CD Pipeline

CI/CD pipeline consists of a series of steps that you need to perform to test and deliver a new version of any software. Using CI/CD pipelines, you can ensure effective updations of your LLM.

Monitoring Setup

To monitor the LLM performance, you can use metrics such as answer correctness, contextual relevance, and presence of hallucinations. Track and take measures to optimize the usage of computational resources. You should also enable your end-users to give feedback and make improvements in your model on its basis.

Backup Procedures

Having backup procedures helps you to manage data loss, deployment, or system failures. For this, a backup of training data must be maintained in raw and processed format. You should also save pre-trained and fine-tuned model weights for immediate rollback during deployment failure.

Recovery Planning

Regularly test your recovery mechanisms to ensure they are functional. You should also create documentation of your recovery plan so that anyone can refer to it in adverse situations.

Conclusion

If you are using HubSpot as a CRM system, you can enhance your business performance and revenue further by developing LLM based on HubSpot data. This blog explains in detail how to create LLM with HubSpot data. It includes all the steps that you should follow, including an accurate deployment procedure for HubSpot data-based LLM. You can use this article as a guide to develop an LLM for your own business and use it to automate your business operations.

Limitless data movement with free Alpha and Beta connectors
Introducing: our Free Connector Program
The data movement infrastructure for the modern data teams.
Try a 14-day free trial