Machine Learning. Data science. Natural language processing.
These buzzwords are taking over the internet but to the vast majority of us the breakthroughs they are achieving simply seem like magic. As all processes – mundane to complicated – become automated, it is increasingly important to decode the magic and truly understand how technology works. Let’s look at this in greater depth through Nexus S.I.X – our latest solution extrapolating sustainability insights and uncover how we use Artificial Intelligence (AI) to help improve the process.
What issues is ESG facing right now?
In accordance with the various ESG frameworks or internal guidelines, companies have started disclosing their data – whether about their carbon emissions, diversity numbers or community engagement efforts. However, organising this information into a usable and comparable format remains an issue. Data is strewn across different documents, some of it is unavailable and a lot of it is unstructured.
How do we solve this?
Leveraging AI to structure the vast amount of unstructured ESG data is no easy task. The initial process still relies on manual processes such as extracting relevant data from the files, annotating the documents and verifying the data. These form the initial sets of information used to answer key questions that the investor wants to understand.
- Parallel track
As the manual track continues and investors receive the information they want, annotators tag the data to the question being asked, the document it was found in and the key words used to identify it and these are fed as training sets into a parallel AI system.
After enough data is fed into the system, the algorithm also engages in a similar process – extracting data and consolidating it into a usable format. The machine then learns over time and starts to pick up patterns, replicating them to deliver more accurate results. The two tracks are then compared to evaluate the accuracy of the code.
This entire process requires time as well as a large amount of data. Feeding one list with companies ranked from most to least ESG compliant doesn’t allow the system to automatically understand how to rank another set of companies. Firstly, a huge amount of data is needed in order to train and test the system. Secondly, we need to give the system a set of instructions, or code, for it to understand exactly why a company has the rank it has and imitate it accordingly. We need to break down which factors we considered when assigning the score as well as the relative importance of each factor.
Nexus also uses the expert-in-the-loop process to enhance operations. The end-users can suggest changes to the data extracted through the platform itself instead of going through multiple layers of bureaucracy. Hence, issues with the extracted data can be immediately flagged out, allowing the input to be amended. This increases the accuracy of the training sets and provides a signal to solution architects to look into improving the algorithm. This then contributes to a virtuous cycle because with greater speed and the ability to verify large amounts of data, the new datasets’ accuracy increases quickly, allowing the entire process to be much swifter and efficient.
- Quantitative or Qualitative?
While extracting information from company documents allows us to calculate numerical quantities and understand the firm’s past behaviour, there is also non-numerical information that can give us analysts insight into whether to invest. Some of this comes in the form of goals like net-zero emission targets while others can come in the form of actionable steps the company pledges to take in order to reach these goals.
Once again, while this process starts out manually, natural language processing is concurrently used so the system can eventually extract the relevant text in the right context. This involves identifying the key words and combination of phrases and dates that relate to the activity the analyst wants to evaluate. Information is extracted from company records as well as recent news and the company is scored accordingly.
So, what will happen in the future?
Being an AI company, it makes sense to do some predictive analysis. As we feed more information and the model keeps learning, its accuracy will improve and we will be able to use the AI track as our main medium of extraction. The algorithm will not only be able to extract relevant numerical information but also know what score to assign it and assess how the score should be changed when new developments concerning it appear in the news.
Join us as we move steadily along this automation journey.