How to Identify Rules from Text Mining

Technologies can shape society and individual behaviours. Historical data helps to understand how different socio-economic actors within society, can influence the emergence of new technologies within and across different socio-technical systems (e.g. mobility, food, energy). Traditional historical work on understanding technological evolution and diffusion has relied on examining archives in considerable detail, but this meant that it was studied on a relatively limited scale.

Major advancements in digital technologies and text mining techniques have enabled us to scale up this work. By systematically analysing large corpora of text (e.g. newspapers, books), we can navigate through a wealth of information and make sense of this to increase our understanding of the evolution and directionality of socio-technical systems over long periods of observation.

Over the last couple of years, the Deep Transitions project has explored ways to map changes in socio-technical systems by text mining large text corpora of articles. The project has analysed first Scientific American, and will also use The New York Times – spanning over 150 years. This mapping effort has particularly focussed on tracing a building block of the Deep Transition theory, i.e. the concept of rules. Rules are defined as “humanly devised constraints that structure human action, leading to regular patterns of practice” (Schot and Kanger, 2018). The emergence of the phenomenon of mass production has been the major empirical focus to devise and test a novel methodological approach to map rules.

Taking advantage of increasing availability, and special research exemption on text and data mining, the text analysis of articles in Scientific American has enabled us to map the emergence of new inventions and technologies together with the set of rules underpinning the regime of mass production starting from 1850s. The Scientific American has been a US “techno-enthusiastic magazine” mostly focused on bringing to an interested audience the latest inventions and technologies, and in more recent years, scientific discoveries. The study of The New York Times will instead address a more general audience, thus providing a more critical perspective on socio-technical change. This, in turn, will map the “socio” component of socio-technical systems and the emergence of rules within and across systems in more details.

Tracing the concept of rules in raw text data faces a number of challenges: Where can evidence of rules be found? Which are the most suitable sources of text to investigate rules? How can rules be detected as they are embedded in text? How to minimize the risk of biases and misinterpretations that the application of text-mining techniques can generate? How do we recognise noise in the data? How can we assess the quality of the corpus extracted from pdf images of the magazine, especially for data from 1850s to 1950s? How can we validate the developed methodology?

To address these challenges, we have focussed on developing a mixed method approach that integrates insights from in-depth historical works and from the analysis of large amount of textual data. For each rule, we have identified a set of keywords, which are terms closely associated with the rule as identified in the qualitative historical work. The recurring presence of a rule’s keywords in the Scientific American articles provides evidence of when different rules have started to emerge, and when these have then stabilised, and declined in the techno-scientific literature. We especially study how rules emerge in different periods within the surges of development (Perez, 2002), namely gestation, installation, turning point, deployment, including the subsequent surge (the 5th surge).

Our analysis has shown that a number of rules are of particular significance. First the rule representing ‘Making parts interchangeable’ – represented by keywords such as “gauges”, “specification” or “interchangeability” ­­– is one of the rules which were highly discussed within the Scientific American. Two successive waves of use of “specification” and “gauges” peaked respectively around 1850s and 1890s. This pattern is accompanied by the increase of use of the word interchangeability over this period.

Secondly, the rule of ‘electrification’ has also been well discussed within the Scientific American. Electrification has featured the introduction of artefacts increasing the productivity of workers, such as first “ventilation”, then “electric motors” and “arc lighting”. This was followed by an increasing use of the word “electrification” towards the end of the gestation phase.

While other rules emerged within the installation phase, the wartime period which is understood as being the turning point of the 4th surge, saw a re-intensification of all the rules used previously. We interpret this result as the application of mass production principles to meet wartime production, such as production of cars, tanks, airplanes, food, etc. This could be observed for the rules of interchangeability and electrification, but many other rules surrounding mass production as well.

After the turning point, the use of these keywords decreased again, for the Scientific American to tackle new emerging trends (or rules) which increase in importance at this point and may be driving the newer surge. If one looks at the emergence of the concept of  “mass production”, it emerged in the 1920s, much later than individuals rules that drove its development, and like many other of our keywords peaked in the wartime period, and was still used thereafter relatively frequently compared to words related to individual rules. The deployment phase of mass production also saw the emergence of newer rules related to the use of robots and a move to digitalisation.

The study of the Scientific American has shown that using quantitative tools to map rules can yields results to understand how rules become more or less important over time. It corroborates that rules may emerge at differing point in time within the gestation and the installation phase. At this point only more general terms to name the trend arises, i.e. “mass production”. The turning point is shown to be of particular importance, where existing rules are mobilised towards the wartime efforts.

This work is a first step to enable the study of rules in a quantitative manner. The use of text analysis to understand the role of rules towards socio-technical systems can be further expanded by using co-word analysis to understand how rules may diffuse across multiple socio-technical systems, or using sentiment analysis to explore the social acceptance of rules over their introduction and development.


Kanger, L., Bone, F., Rotolo, D., Steinmueller, E.W., Schot, J. 2020. Deep Transitions: A Mixed Method Study of the Historical Evolution of Mass Production. Working paper.

Perez, C. 2002. Technological Revolutions and Financial Capital: The Dynamics of Bubbles and Golden Ages. Cheltenham, UK: Edward Elgar.

Schot, J., and Kanger, L. 2018. Deep Transitions: Emergence, Acceleration, Stabilization and Directionality. Research Policy 47(6): 1045-1059.

Related content