Building data chat agent with PyAirbyte Polygon.io source & Langchain

Learn how to use polygon.io as a data source and use the Langchain experimental agent.

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Should you build or buy your data pipelines?

Download our free guide and discover the best approach for your needs, whether it's building your ELT solution in-house or opting for Airbyte Open Source or Airbyte Cloud.

Download now

Install PyAirbyte and other dependencies

!pip3 install airbyte langchain_openai langchain-experimental

Polygon Airbyte Datasource

import airbyte as ab

source = ab.get_source(
    "source-polygon-stock-api",
    install_if_missing=True,
    config={
      "apiKey": ab.get_secret("POLYGON_API_KEY"),
      "stocksTicker": "AAPL" ,
      "multiplier": 1,
      "timespan": "day",
      "start_date": "2023-01-01",
      "end_date": "2023-12-31",
      "adjusted": "true"
      },

)

# Verify the config and creds by running `check`:
source.check()
Connection check succeeded for `source-polygon-stock-api`.
source.select_streams(['stock_api']) # Select only issues stream
read_result: ab.ReadResult = source.read()

Read Progress

Started reading at 17:27:59.

Read 120 records over 3 seconds (40.0 records / second).

Wrote 120 records over 1 batches.

Finished reading at 17:28:03.

Started finalizing streams at 17:28:03.

Finalized 1 batches over 1 seconds.

Completed 1 out of 1 streams:

  • stock_api

Completed writing at 17:28:05. Total time elapsed: 5 seconds

Completed `source-polygon-stock-api` read operation at 17:28:05.
stock_data = read_result["stock_api"].to_pandas()
stock_data = stock_data.rename(columns={
    'c': 'Close',
    'h': 'High',
    'l': 'Low',
    'o': 'Open',
    't': 'Timestamp',
    'v': 'Volume',
    'vw': 'VWAP',

})

stock_data = stock_data[['Close', 'High', 'Low', 'Open', 'Timestamp', 'Volume', 'VWAP']]


display(stock_data)
Close	High	Low	Open	Timestamp	Volume	VWAP
0	125.07	130.9000	124.1700	130.280	1672722000000	112117471.0	125.7250
1	126.36	128.6557	125.0800	126.890	1672808400000	89100633.0	126.6464
2	125.02	127.7700	124.7600	127.130	1672894800000	80716808.0	126.0883
3	129.62	130.2900	124.8900	126.010	1672981200000	87754715.0	128.1982
4	130.15	133.4100	129.8900	130.465	1673240400000	70790813.0	131.6292
...	...	...	...	...	...	...	...
115	185.01	186.1000	184.4100	184.410	1687233600000	49799092.0	185.2709
116	183.96	185.4100	182.5901	184.900	1687320000000	49515697.0	184.0809
117	187.00	187.0450	183.6700	183.740	1687406400000	51245327.0	186.1166
118	186.68	187.5600	185.0100	185.550	1687492800000	53112346.0	186.4952
119	185.27	188.0500	185.2300	186.830	1687752000000	48088661.0	186.3292
120 rows × 7 columns

Chat with your data

from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
from IPython.display import display
import os

os.environ['OPENAI_API_KEY'] = ab.get_secret("OPENAI_API_KEY")
agent = create_pandas_dataframe_agent(
        ChatOpenAI(model="gpt-3.5-turbo"),
        stock_data,
        verbose=True,
        agent_type=AgentType.OPENAI_FUNCTIONS,
    )
output = agent.invoke("what is the given data is about?")
print(output['output'])
> Entering new AgentExecutor chain...
Based on the columns in the dataframe `df`, the data appears to be financial market data related to trading. Here are the columns and their potential meanings:

- `Close`: Closing price of a financial asset
- `High`: High price of a financial asset during a specific time period
- `Low`: Low price of a financial asset during a specific time period
- `Open`: Opening price of a financial asset
- `Timestamp`: Timestamp of the data point
- `Volume`: Volume of the financial asset traded
- `VWAP`: Volume Weighted Average Price

Therefore, the data in the dataframe `df` seems to represent price and volume data of a financial asset over different timestamps.

> Finished chain.
Based on the columns in the dataframe `df`, the data appears to be financial market data related to trading. Here are the columns and their potential meanings:

- `Close`: Closing price of a financial asset
- `High`: High price of a financial asset during a specific time period
- `Low`: Low price of a financial asset during a specific time period
- `Open`: Opening price of a financial asset
- `Timestamp`: Timestamp of the data point
- `Volume`: Volume of the financial asset traded
- `VWAP`: Volume Weighted Average Price

Therefore, the data in the dataframe `df` seems to represent price and volume data of a financial asset over different timestamps.
output = agent.invoke("what is the yearly returns of the given stock?")
print(output['output'])
> Entering new AgentExecutor chain...

Invoking: `python_repl_ast` with `{'query': "df['Year'] = pd.to_datetime(df['Timestamp'], unit='ms').dt.year\nyearly_returns = df.groupby('Year')['Close'].last() / df.groupby('Year')['Close'].first() - 1\nyearly_returns"}`


Year
2023    0.48133
Name: Close, dtype: float64The yearly returns of the given stock for the year 2023 are approximately 48.13%.

> Finished chain.
The yearly returns of the given stock for the year 2023 are approximately 48.13%.
output = agent.invoke("what is the 50-day moving average of the entire data?")
print(output['output'])
> Entering new AgentExecutor chain...

Invoking: `python_repl_ast` with `{'query': "df['Close'].rolling(window=50).mean().iloc[-1]"}`


174.67870000000002The 50-day moving average of the entire data is approximately 174.68.

> Finished chain.
The 50-day moving average of the entire data is approximately 174.68.

Similar use cases

No similar recipes were found, but check back soon!