Install PyAirbyte and other dependencies
!pip3 install airbyte langchain_openai langchain-experimental
Polygon Airbyte Datasource
import airbyte as ab
source = ab.get_source(
"source-polygon-stock-api",
install_if_missing=True,
config={
"apiKey": ab.get_secret("POLYGON_API_KEY"),
"stocksTicker": "AAPL" ,
"multiplier": 1,
"timespan": "day",
"start_date": "2023-01-01",
"end_date": "2023-12-31",
"adjusted": "true"
},
)
# Verify the config and creds by running `check`:
source.check()
Connection check succeeded for `source-polygon-stock-api`.
source.select_streams(['stock_api']) # Select only issues stream
read_result: ab.ReadResult = source.read()
Read Progress
Started reading at 17:27:59.
Read 120 records over 3 seconds (40.0 records / second).
Wrote 120 records over 1 batches.
Finished reading at 17:28:03.
Started finalizing streams at 17:28:03.
Finalized 1 batches over 1 seconds.
Completed 1 out of 1 streams:
Completed writing at 17:28:05. Total time elapsed: 5 seconds
Completed `source-polygon-stock-api` read operation at 17:28:05.
stock_data = read_result["stock_api"].to_pandas()
stock_data = stock_data.rename(columns={
'c': 'Close',
'h': 'High',
'l': 'Low',
'o': 'Open',
't': 'Timestamp',
'v': 'Volume',
'vw': 'VWAP',
})
stock_data = stock_data[['Close', 'High', 'Low', 'Open', 'Timestamp', 'Volume', 'VWAP']]
display(stock_data)
Close High Low Open Timestamp Volume VWAP
0 125.07 130.9000 124.1700 130.280 1672722000000 112117471.0 125.7250
1 126.36 128.6557 125.0800 126.890 1672808400000 89100633.0 126.6464
2 125.02 127.7700 124.7600 127.130 1672894800000 80716808.0 126.0883
3 129.62 130.2900 124.8900 126.010 1672981200000 87754715.0 128.1982
4 130.15 133.4100 129.8900 130.465 1673240400000 70790813.0 131.6292
... ... ... ... ... ... ... ...
115 185.01 186.1000 184.4100 184.410 1687233600000 49799092.0 185.2709
116 183.96 185.4100 182.5901 184.900 1687320000000 49515697.0 184.0809
117 187.00 187.0450 183.6700 183.740 1687406400000 51245327.0 186.1166
118 186.68 187.5600 185.0100 185.550 1687492800000 53112346.0 186.4952
119 185.27 188.0500 185.2300 186.830 1687752000000 48088661.0 186.3292
120 rows × 7 columns
Chat with your data
from langchain.agents.agent_types import AgentType
from langchain_experimental.agents.agent_toolkits import create_pandas_dataframe_agent
from langchain_openai import ChatOpenAI
from IPython.display import display
import os
os.environ['OPENAI_API_KEY'] = ab.get_secret("OPENAI_API_KEY")
agent = create_pandas_dataframe_agent(
ChatOpenAI(model="gpt-3.5-turbo"),
stock_data,
verbose=True,
agent_type=AgentType.OPENAI_FUNCTIONS,
)
output = agent.invoke("what is the given data is about?")
print(output['output'])
> Entering new AgentExecutor chain...
Based on the columns in the dataframe `df`, the data appears to be financial market data related to trading. Here are the columns and their potential meanings:
- `Close`: Closing price of a financial asset
- `High`: High price of a financial asset during a specific time period
- `Low`: Low price of a financial asset during a specific time period
- `Open`: Opening price of a financial asset
- `Timestamp`: Timestamp of the data point
- `Volume`: Volume of the financial asset traded
- `VWAP`: Volume Weighted Average Price
Therefore, the data in the dataframe `df` seems to represent price and volume data of a financial asset over different timestamps.
> Finished chain.
Based on the columns in the dataframe `df`, the data appears to be financial market data related to trading. Here are the columns and their potential meanings:
- `Close`: Closing price of a financial asset
- `High`: High price of a financial asset during a specific time period
- `Low`: Low price of a financial asset during a specific time period
- `Open`: Opening price of a financial asset
- `Timestamp`: Timestamp of the data point
- `Volume`: Volume of the financial asset traded
- `VWAP`: Volume Weighted Average Price
Therefore, the data in the dataframe `df` seems to represent price and volume data of a financial asset over different timestamps.
output = agent.invoke("what is the yearly returns of the given stock?")
print(output['output'])
> Entering new AgentExecutor chain...
Invoking: `python_repl_ast` with `{'query': "df['Year'] = pd.to_datetime(df['Timestamp'], unit='ms').dt.year\nyearly_returns = df.groupby('Year')['Close'].last() / df.groupby('Year')['Close'].first() - 1\nyearly_returns"}`
Year
2023 0.48133
Name: Close, dtype: float64The yearly returns of the given stock for the year 2023 are approximately 48.13%.
> Finished chain.
The yearly returns of the given stock for the year 2023 are approximately 48.13%.
output = agent.invoke("what is the 50-day moving average of the entire data?")
print(output['output'])
> Entering new AgentExecutor chain...
Invoking: `python_repl_ast` with `{'query': "df['Close'].rolling(window=50).mean().iloc[-1]"}`
174.67870000000002The 50-day moving average of the entire data is approximately 174.68.
> Finished chain.
The 50-day moving average of the entire data is approximately 174.68.