Graphics from Glenn Carstens-Peters – https://unsplash.com/de/@glenncarstenspeters

How we use machine learning to create our articles

Transparency is an important step for user trust in a product. Today we want to focus more on our use of machine learning methods for Aggregata.

Henrik Bartsch

Henrik Bartsch

The texts in this article were partly composed with the help of artificial intelligence and corrected and revised by us. The following services were used for the generation:

Introduction

We, the Aggregata team, see ourselves as instructors and users of problems and solutions in the field of machine learning (commonly known as “artificial intelligence”). We provide basic information on how reinforcement learning works, how a decision tree learns from data, or how we can use a neural network to identify certain features in images. But there is one thing we have not yet explored: How to actually use machine learning (generative) models to evolve Aggregata. This our intention for this article.

Usage

First of all: At Aggregata, we care about authentic content that our users should be able to rely on when learning or applying machine learning methods.

This principle means that we generally want to and can use appropriate models, as long as it does not negatively affect the quality of our articles. Furthermore, all information should be verified by a neural network after research, regardless of the usual quality of the model used.

Although this means that we could use (generative) neural networks to generate articles, we do so very rarely. As part of the transparency we want to offer our users, we would like to discuss the different ways in which we use such models.

Text Translation

For textual translations, we use DeepL Translate due to the exceptional accuracy with which this service works. This allows us to find accurate translations even for complex topics when we find sources in different languages and only have a rough idea of what a potential translation (in English or German) might look like.

In the meantime, we have also tried translating articles using the T5 model. Outside of the T5 article, we have abandoned this idea (for now) due to suboptimal results.

Text Optimization

Text optimization is an important part of how an article reads and is remembered by readers. We have also found methods to make our texts better than we would normally write them. The primary tool we use for this is DeepL Write, which has proven its worth in practice.

We also try to get ideas for different writing styles to make our articles more appealing to all users. Besides Microsoft Copilot or Google Gemma, we also use Llama 2 for this purpose.

Fact-Checking and Research Support

A very important (and time-consuming) part of our work at Aggregata is searching for and verifying information.

To put this into perspective, when researching particularly complex topics (without the support of machine learning models), research can account for up to 75%75\% of the entire article writing process.

That 75%75\% is a big part of the process. To avoid errors and prioritize popular sources, we often use Microsoft Copilot to find appropriate sources for specific questions. These sources are then re-evaluated by us, but often saves us from having to search through unsuitable sources - especially if they do not contain any further interesting information on our question, or only contain additional, redundant information.

In rare cases, we also use Microsoft Copilot, Llama 2, and increasingly more often Google Gemma for fact checking. This is especially helpful for complex mathematical topics, where errors are often not immediately obvious. These models are a kind of quality control for error-prone areas of our articles.

Generating Content

In certain fields, it is particularly difficult to write texts that are interesting and exciting for the reader - especially if they contain a lot of mathematics, incomprehensible or unintuitive basics. To get ideas for examples or writing styles, we sometimes use models such as Llama 2 or Microsoft Copilot. In this way, we can achieve better text quality for topics that would otherwise be less appealing to our users than we would like.

Author’s note: Here, too, we do not use the corresponding editions of the models mentioned here 1:1, but rather work with the editions, summarize them, and add the corresponding sources before they appear on Aggregata.

Possible future Changes

So far we have explained how we have used (generative) machine learning models for our publications. Now we want to give a brief outlook on what we see as the future of Aggregata in this area.

Automated Translation and Automation

We are currently considering having our content automatically translated into languages that we are not yet able to Aggregata. Our linguistic focus is on German and English, and this will not change in the foreseeable future, at least not to the extent that we could write professional, high-quality articles in those languages. We are therefore considering the possibility of adding other languages through automated translation and text optimization.

For example, we are toying with the idea of translating into the following languages

  1. Spanish
  2. French
  3. Hindi

and a number of other languages. These languages have priority because they are spoken by a particularly large number of people. (Source)

Image Generation

Up until now, we’ve been using images from Unsplash to make the covers of our articles and publications more graphically appealing. While this isn’t necessarily time-consuming, it’s sometimes difficult to find just the right image that captures the essence of our articles without looking too cluttered. We are experimenting with Stable Diffusion XL, which could help us automate this process, generate more appropriate images, and increase our productivity.

TL;DR

In this article, we present our methods and framework for how we at Aggregata use machine learning to improve our content. We want to make it clear that we do not use these methods to generate content 1:1 and then copy it, but to give us ideas about which articles can be made more relevant and interesting. We mainly use methods from text translation and optimization, but in some places we also use methods from research support and fact checking. We use generative models to make particularly complex problems easier to understand without compromising the quality of the information conveyed. Finally, we discussed our possible ideas on how to use neural networks in the future to make Aggregata even better.