manetta berends index / i-could-have-written-that.md

i-could-have-written-that (2016-)

i-could-have-written-that* is a practice based project about text mining, to question the readerly nature of text-based machine learning techniques and investigate how they can be used as writing procedures.

compilation of rhetorics from the text mining discourse

Text mining is a subgroup of natural language processing and machine learning that only works with text. It applies statistical models to analyse human behaviour, expressed sentiment or perform other sorts of text classification. Its powerful ability to work with large amounts of text has made it a central tool for search engines, the advertisement industry, social media timelines and academic research. Text mining is applied to prevent terrorism, increase markets and to automate certain tasks. However, the promises of machine learning practices can be very confusing. Various text mining companies use rhetorics to present their products like "the power to know", "the absolute truth", "with an accuracy that rivals and surpasses humans", which creates the confusing expectation that text mining tools can reveal an absolute and objective truth out of textual information. Beside these kind of statements, machine learning is surrounded by sci-fi AI scenario's, the cloud, invisibility, tech-oppertunism and many pre-trained classifiers that are available through API's.

workshop i-could-have-written-that during Fuzzy Logic, Piet Zwart Institute, Rotterdam, June 2016

In this project, i work with text-based machine learning techniques to explore their writerly nature through code. By following a machine learning process i try to embrace its nature and transform it into writing tools. It is a way to present machine learning results as a written product, that emphasizes the presence of multiple forms of authorship: such as human contributions, machine learning norms and scripted procedures.

The project is built with ie. nltk, Pattern, python, cgi, jinja, git.

the first and last poster from the series Text mining is ...

The poster series Text mining is ... is a self-reflective interface based on an existing rule-based text mining classifier. In response to the overly self-confident rhetorics from the text mining discourse, the posters display a large set of utterances stating what text mining is. The vocabulary of the posters are based on a rule-based text mining script called modality.py, that is included in the text mining software package Pattern. The script modality.py is written to detect the degree of certainty as a value between -1.0 and +1.0, where values > +0.5 represent facts. To classify a sentence, the concept of 'certainty' is divided into 9 categories, ranging from certain and neutral to doubts. By applying this rigid method to text mining itself, the statements on the posters break with the idealized strict categories and explore possible positions of a technology that offers both power and misleading myths at the same time. Browse through the full poster set here.

links:

Further research and development of this project is kindly supported by


research and initiated work

commissioned work