Be the first to know.
Get our A.I. weekly email digest.

Researchers Develop Affordable, Versatile Model for Tracking Potential Production Accidents

A team of scientists from ITMO University has developed software that can automatically track potential human causes of industrial accidents, such as drinking or smoking in the workplace.

author avatar

19 Sep, 2025. 3 minutes read

Neural network-generated

Neural network-generated

This article was first published on

news.itmo.ru

A team of scientists from ITMO University has developed software that can automatically track potential human causes of industrial accidents, such as drinking or smoking in the workplace. Unlike its alternatives, the model is capable of detecting up to 10 events simultaneously, has the accuracy of 80% (higher than that of VideoMAE, the best open-source model trained on the same data), and requires less computational resources. The algorithm is already in use at a major production site in Perm Krai, where it has lowered the required number of in-person checkups by threefold.

Based on the data of the Pension and Social Insurance Fund of Russia, nearly a third of workplace accidents (27.8%) occur due to the personal negligence of employees. Typically, CCTV systems are used to track dangerous or illegal actions at facilities or in public spaces. Most often, the video is processed manually, but this method isn’t perfect as it’s difficult for a human to monitor several screens at the same time; important events may also be missed because of fatigue or distraction.

This process can be and is automated with neural networks that can continuously monitor events and identify the important ones in a long video stream. However, each existing solution has its limitations. For instance, the industrial surveillance models on the Russian market can only detect objects (i.e., masks, helmets) or people, but not track their actions. The models available internationally can identify actions, but aren’t accurate: trained on the dataset assembled by ITMO scientists, the models were correct in only 24% (VideoMAE) and 48% (Hiera) of cases.

Experts from ITMO’s Computer Technologies Laboratory have developed ActionFormer, an algorithm that can detect ten actions with an 80% accuracy: for example, it can track when an industrial employee is smoking or eating in the workplace, getting distracted and talking on the phone, moving equipment without permission, and entering areas where access is prohibited. Moreover, the solution can prevent sabotage, such as when the lens is smeared or blocked to hide prohibited activity. At many industrial sites, these actions are considered a safety violation because they can often lead to serious consequences.

The new algorithm consists of two models that analyze sequences of images: one of them marks human silhouettes with “skeleton” dots, while the other uses that data to locate the staff and classify their actions. All information about illegal or potentially dangerous activities is put into a database or sent directly to an operator, depending on the client’s needs.

Neural network-generated "skeleton" dots on a smoking on an image of a person smoking

Compared to its currently available counterparts, the new model requires less resources to function because it contains a relatively small amount of parameters (3.7 million). This was made possible thanks to a convolutional model: instead of analyzing the entire image, it considers certain points and object masks. In comparison, other models analyze more parameters: for instance, the number is 22 million, 73 million for Hiera, 10 billion for Tarsier, and 7 billion for OpenVLA; however, these models also require more resources to run.

The neural network was trained on over 180,000 images: for this, the team used not only open-source datasets, but self-recorded videos.

Right now, the solution is already in use at a major site in Perm Krai. Thanks to the system, the enterprise managed to reduce the number of physical safety compliance checks threefold and avoid a number of serious errors. For example, the algorithm has already prevented improper equipment repairs by detecting that an employee was distracted by talking on the phone. 

The algorithm is openly available. This means that users can train the model to recognize various actions given that they collect the necessary training dataset.

“In the future, we want to train the model on a larger sample of actions. Our next task is to adapt the action recognition system for wearable cameras. For example, it could be used in mines during safety briefings – to monitor that the team is performing the necessary actions and following safety regulations, such as wearing protective gear, following guidelines, or safely climbing down the ladder,” shares Valeria Efimova, the head of the project, PhD in engineering, and a researcher at ITMO’s Computer Technologies Laboratory.

Moreover, another version of the model is already in the works; this version could be used to register illegal activities on the grounds of apartment buildings. For this, the algorithm was trained on 150,000 images of various scenarios, including drinking at playgrounds, unloading of trucks in unauthorized places, and trespassing attempts. In the future, the team will add other scenarios so that the application would identify those who deface playgrounds or walk on lawns. This version is planned for release in October 2025.

The project team includes ITMO students Anastasia Shpileva, Maksim Koltakov, and Georgy Petrov (Information Technologies and Programming Faculty) and Ruslan Zaripov (Institute of Applied Computer Science).

24,000+ Subscribers

Stay Cutting Edge

Join thousands of innovators, engineers, and tech enthusiasts who rely on our newsletter for the latest breakthroughs in the Engineering Community.

By subscribing, you agree to ourPrivacy Policy.You can unsubscribe at any time.