Bloomberg is bringing to finance what GPT and ChatGPT brought to everyday general purpose chatbots.
The paper that Bloomberg released reveals the great technical depth of its BloombergGPT machine learning model, applying the type of AI techniques that GPT uses to financial datasets. Bloomberg’s Terminal has been the go-to resource for the trading and financial world for financial market data for over four decades. As a result, Bloomberg has acquired or developed a large number of proprietary and curated datasets. In many ways, this data is Bloomberg’s crown jewels and in this version of BloombergGPT, this proprietary data is used for building an unprecedented financial research and analysis tool.
The large language models fueling such AI experiments are syntactic and semantic in nature, and are used to predict a new outcome based on existing relationships in and across source texts.
Machine learning algorithms learn from source data and produce a model, a process known as ‘training.’ Training for the BloombergGPT model required approximately 53 days of computations run on 64 servers, each containing 8 NVIDIANVDA +0.6%DIA 0.0% 40GB A100 GPUs. For comparison, when we use ChatGPT, we provide to a model (or formula) an input, known as the prompt, and the model then produces an output, much like providing an input to a formula and observing the output. Generation of these models require massive amounts of compute power and thus Bloomberg partnered with NVIDIA and Amazon Web Services in the production of the BloombergGPT model.
Since each GPU costs tens of thousand dollars, if purchased new, and are used for only a short relative duration for model generation, the BloombergGPT team opted to use AWS cloud services to run the computation. Since the cost per server instance is $33 per hour (as currently publicly advertised), we can make a back-of-napkin cost estimation of more than $2.7 million to produce the model alone.
Part of feeding content to a machine learning model involves fragmenting the content into pieces or tokens. One way to think of tokens is ways we can break down an essay, into words being the most obvious, although there may be other strategies to tokenize or fragment an essay, like breaking it into sentences or paragraphs. A tokenizer algorithm determines at what granularity to fragment, because, for example, fragmenting an essay into letters may result in the loss of some context or meaning. The fragmentation would be too granular to be of any practical use. BloombergGPT fragments its financial data source into 363 billion tokens by using a Unigram model, which offers certain efficiencies and benefits. To play with a tokenizer, try the GPT tokenizer
The Bloomberg team used PyTorch, a popular free and open source Python based deep learning package, to train the BloombergGPT model.
In the case of BloombergGPT, source datasets include some weighted proportions of financial news, company financial filings, press releases and Bloomberg News content all collected and curated by Bloomberg over decades. On top of these finance-specific sources, BloombergGPT does integrate in some general and common datasets like The Pile, The Colossal Clean Crawled Corpus or C4, and Wikipedia. Combined, BloombergGPT can provide an entirely new way of doing financial research.