A large part of running a business is record keeping and reporting. Among these records and reports, financing is arguably one of the most important areas. All companies must keep detailed information of finances, capital, investments, spending, and the like. These are usually for legal reasons but can also be useful when working with analysts and investors. Every year, and sometimes for each quarter, companies present the data, their findings, and strategies in a Financial Report. These reports are extensive and lengthy, and as such, require a significant amount of time and effort to create. It is safe to assume that many companies invest a good amount of money and man hours to produce these reports, and automating this process, even partially, has the potential to improve efficiency and reduce spending within the company.
Automation and its Challenges
Of course, it would be great if the entire report could be magically generated with the press of a button, but technology isn’t quite there yet. Instead, we need to be a little realistic and see what, if anything, could be automated. Financial Reports are a collection of numbers, charts, and graphs, coupled with written sections. The main written parts of a Financial Report can be broken into 2 categories: Narratives, and Financial Statements. A Narrative can describe any writing that doesn’t directly rely on or describe data and is usually something non-finance related. Financial Statements are more or less the opposite, and are heavily related to financial topics, including written descriptions of charts or tables. Financial Statements tend to be similar to subsequent ones and have little variety in general.
Of these two categories, it’s clear that Financial Statements have better potential for automation. Describing a table, for example, is fairly objective and isn’t very open to interpretation (not to be confused with explaining a table!). Sure, one could create an “automation” tool that reads in table values and pastes them into sentence templates, but that would be pretty boring to read. While the meaning of a sentence remains consistent, word choice and sentence structure are left to the author. This is sounding more and more like a Machine Translation problem, which is certainly realistic. Essentially, we want to create a model which can be given a table of financial records and then generate sentences which describe the table.
How does one read and describe a Table?
It’s easy enough to give a person a table and ask them to describe it. Depending on the table, they might need a few seconds to figure out what they’re looking at, but almost anyone should be able to successfully perform such a task. However, it’s not as simple as feeding a neural network all of the tables and expecting a finished report to be printed out. Building a network of this size would require an almost comical amount of computing power, and even then there’s no guarantee that it would work. The problem needs to be broken down a bit first. Instead of focusing on entire tables and reports, we can simplify the problem by generating one sentence at a time.
By focusing on one sentence at a time, we greatly reduce the complexity of the problem, as well as the size of the final model. This means we also need to reconsider what our input data should look like. Thinking backwards, if we have a sentence which describes something about a table, this sentence only relies on a few cells within the table. The other cells might as well not even exist in order to write this sentence, which means each sample’s input data is less than the full table. When thinking about this with humans, it also makes sense. Handing someone a table and simply saying: “describe it” is less clear than pointing at a number and saying: “describe what this number means”. Our task changes slightly, but with the same end goal in mind. We now want a model that can process a subsection or a list of cells from a table and generate a sentence describing them.
How should the data look?
There are a few different ways to represent a table as a feature vector. One could pass the table as a list of sentences, for example, going row by row or column by column. This sounds reasonable, but a lot of information is lost this way, because we no longer know where rows or columns begin and end. We can add some special tokens to keep track of this, and even expand it to include more information such as cells and headers, and table titles. These added tokens make our input data look similar to HTML, with features being followed by the tokens. For example, after each cell there is a </cell> token, and after each row, there is a </row> token, and so on.
An example input “sentence” could look like this:
Insurance and Guarantee Program Liabilities of September 30, 2020 and 2019</section_title> 2020</col_header> 2019</col_header></row>
Federal Crop Insurance – Department of Agriculture</row_header> 7.7</cell> 8.9</cell></row>.
Knowing how the data will look, we now need to figure out which cells will be selected from the full table in the first place. At this point, we already had more or less a corpus of sentences from a report which were paired with the table they were describing. An algorithm was used to process the sentence and match any words or monetary amounts which were also in the table. These matched cells, along with their respective row and column headers, and the table title are what make up the sub-table that is given as input data. The sentence then becomes the target data, which the machine attempts to reproduce.
Transformer Networks
#A relatively new, yet popular neural network is the Transformer Network. Transformers can be used for a wide variety of tasks, but are mainly used for Natural Language Processing, Text Generation, and Machine Translation. Earlier it was mentioned that this task was essentially Machine Translation, specifically how we wish to go from a sentence in our “table language” to a sentence in a more familiar language, like German or English. For this task, a BERT2BERT transformer was used, which consists of two Bert networks connected to each other.
Transformers are built using encoders and decoders, and rely heavily on Attention, specifically Multi-Head Self-Attention. Attention is a technique which allows neural networks to focus more on important features and information and can more easily learn dependencies and patterns within data. Attention also makes it possible for networks to work with longer sequences of information, when compared to an RNN for example. Transformers also have a few other tricks which allows them to train more efficiently, and more quickly. Despite working with sequential data, they train non-sequentially with the help of positional encodings. The benefit of this is that they are able to train on the entire sequence at once, instead of processing the parts of the sequence one by one. Combined with a GPU, this parallelized training can be greatly sped up.
Does it work?
Table to Text experiments have been going on for a few years now and have seen varied success with Transformers. The task of trying to generate Financial Statements, in theory, should perform better than typical Table to Text experiments. A typical dataset used for this is called ToTTo and consists of 120,000 tables found across wikipedia. This dataset was an inspiration for how our dataset was created, namely the tables being broken into smaller subtables, as well as the matching algorithm for determining which cells were relevant to the sentence. The ToTTo dataset deals with many different topics, so a model needs to be extremely general purpose. Google’s team was able to achieve a BLEU score of 0.44 in their original ToTTo paper using BERT2BERT.
After our experiments, our BERT2BERT model was able to reach a surprising BLEU score of 0.63. Since our model only needs to familiarize itself with financial tables, it doesn’t need to be general purpose. This could explain why it outperformed the BERT2BERT model on ToTTo data. The sentences that the model produced were of high quality and were faithful to the data found within the table. The grammar and fluency were also top notch, and even low-scoring sentences had few if any grammatical errors. As a prototype, this model certainly shows potential and could even be production ready after a few small tweaks or with a training set of higher quality.