"

19 Large language models

Learning Objectives

  • Explain accounting focused applications of LLMs
  • Use LLM models for summarizing long documents
  • Use LLM models for extracting data from documents
  • Use data with LLMs

Chapter content

Large language models (LLMs) like OpenAI have changed the precision and use of natural language processing. Tasks that were often done with other tools such as summarizing, identifying topics, and text extraction are now often now done with LLMs. The change comes from several factors. First, LLMs have been trained on a nearly comprehensive text corpus. Second, LLMs have unprecedented size and complexity with billions of parameters. Third, LLMs have been made accessible to developers and users.

This chapter introduces accounting related uses for LLMs and working with the OpenAI api using R.

Accounting applications of LLMs

LLMs are currently being used or are being developed for usage in many applications. The dropdown list below desriibes some of these applications.

Example in R

Many LLM providers have designed user interfaces with chatbot functionalities that make using them user friendly. They also provide API (application programming interface) access. Accessing LLMs via API makes integrating LLMs with other tools possible. For example, using an LLM via an API means that you can build apps that access the LLM or feed text or data in an automated way to the LLM. The code below demonstrates simple cases for accessing OpenAI’s LLMs via R using the “openai” package.

You must first sign up for an account with OpenAI and get an API key. There are many tutorials available for this. For instance, the first two steps on this tutorial show how to do this: https://tilburg.ai/2024/03/tutorial-openai-api-in-r/. You must also install the openai package. Once these steps are complete, the code below creates a simple interaction with an OpenAI LLM in R.

library(openai)

openai.api_key = readline("Input API key:")
Sys.setenv(
  OPENAI_API_KEY = openai.api_key
)
resp = create_chat_completion(
  model = "gpt-4o-mini",
  temperature = 0,
  messages = list(
    list(
      "role" = "system",
      "content" = "You are a friendly assistant."),
    list(
      "role" = "user",
      "content" = readline("How can I help you today?:"))))
resp$choices$message.content
The code above can be modified to accept input from R text and data. Use Walmart’s first quarter 2026 earnings announcement press release available here: https://corporate.walmart.com/content/dam/corporate/documents/newsroom/2025/05/15/walmart-releases-q1-fy26-earnings/q1-fy26-earnings-release.pdf

The code below inputs the earnings announcement and requests a summary.

library(pdftools)

txt <- pdf_text("https://corporate.walmart.com/content/dam/corporate/documents/newsroom/2025/05/15/walmart-releases-q1-fy26-earnings/q1-fy26-earnings-release.pdf")
txt <- paste(txt,collapse=" ")
  
resp = create_chat_completion(
  model = "gpt-4o-mini",
  temperature = 0,
  messages = list(
    list(
      "role" = "system",
      "content" = "Your job is to summarize the key points of earnings announcements. Respond with a single paragraph and three bullet points."),
    list(
      "role" = "user",
      "content" = txt)))
resp$choices$message.content

The benefit of code such as that above is that it can then be used to process LLM prompts for a large number of documents. Simple adjustments can use the code for other purposes. For example, the system prompt could request that the LLM extract key pieces of information such as total quarter sales.

The code below alters the previous code by using data input. The data is available here: https://www.dropbox.com/scl/fi/gt2y8vpxakq5c25z54hbl/CompanyDivisions.csv?rlkey=tcylo8t9omhof3iabzpkfybuw&st=fwisvpai&dl=0. Assuming the file has been imported as a data frame titled “df”, the following code requests a summary of the data.

txt <- format_csv(df)

resp = create_chat_completion(
  model = "gpt-4o-mini",
  temperature = 0,
  messages = list(
    list(
      "role" = "system",
      "content" = "The following data set is a csv file that contains sales numbers from different company divisions. Summarize 3 insights from the data set."),
    list(
      "role" = "user",
      "content" = txt)))
resp$choices$message.content

The code above can be altered for other purposes; for example to prompt specific output, generate R code that analyzes the data, or extracts key items.

Tutorial video

Conclusion

Review

Mini-case video

References

https://tilburg.ai/2024/03/tutorial-openai-api-in-r/

 

License

Data Analytics with Accounting Data and R Copyright © by Jeremiah Green. All Rights Reserved.