Gépi tanuláson (ML AI) alapuló adatosztályozás a víz alatti hangok és zajok elemzésére

Machine Learning (ML AI) based data classification for underwater sound and noise analysis


  • ARADI Attila
  • VARGA Attila Károly


hidrofon, bioakusztika, gépi tanulás, mesterséges intelligencia, víz alatti zaj, cetfélék akusztikus kommunikációja, víz alatti hangok osztályozása, neurális hálózat (NN)


In this article, we introduce a machine learning method rooted in artificial intelligence (AI) and neural network (NN) frameworks for categorizing underwater acoustics, marine ambient noises, and marine mammal vocalizations (such as those of dolphins and other cetaceans) captured via hydrophones. The gathering of such audio data yields an extensive collection of underwater sound samples. Manually analyzing these recordings can be tedious, prone to mistakes, and may lead to oversight, especially given the repetitive nature of the task. Bearing these factors in mind, we explore the potential of employing AI for refining and categorizing these audio samples, aiming to identify specific marine species like dolphins and cetaceans, as well as pinpointing the type of vessel for ship-generated noises.


Ebben a tanulmányban egy mesterséges intelligencián (AI) azaz neurális hálózaton (NN) alapuló gépi tanulási technikát mutatunk be, és annak alkalmazását a víz alatti hangok, víz alatti zajok és tengeri emlősök (delfinek, cetfélék) hidrofonokkal rögzített víz alatti hangjainak osztályozására. Az adatgyűjtési folyamat során nagy mennyiségű víz alatti hangfelvétel keletkezik, amelyek emberi feldolgozására igen időigényes, és a munka monoton jellege miatt nagy az emberi hiba és az emberi figyelmetlenség lehetősége. Ezeket a szempontokat figyelembe véve felmerült a mesterséges intelligencia alkalmazásának lehetősége a hangfelvételek utólagos feldolgozására, a hangok osztályozására, a cetfélék és delfinek esetében a fajfelismerés szintjéig, a hajóforgalom zaja esetében pedig a hajótípus-felismerés szintjéig.


“Soundscapes in the North Adriatic Sea and their impact on marine biological resources,” Retrieved from https://www.italy-croatia.eu/web/soundscape

M.V. Valueva, et al., "Application of the residue number system to reduce hardware costs of the convolutional neural network implementation, Convolutional neural networks are a promising tool for solving the problem of pattern recognition.," Mathematics and Computers in Simulation. Elsevier BV. 177: 232-243., 2020.

Z. Wei, "Shift-invariant pattern recognition neural network and its optical architecture,". Proceedings of Annual Conference of the Japan Society of Applied Physics., 1988.

Z. Wei, "Parallel distributed processing model with local space-invariant interconnections and its optical architecture". Applied Optics. 29 (32): 4790-7. 1990.

A. Mouton, et al., “Artificial Intelligence Research.,” Communications in Computer and Information Science. Cham: Springer International Publishing. 1342: 267-281.,2020.

Van Den Oord, et al., “Deep content-based music recommendation.,” Curran Associates, Inc. pp. 2643-2651., 2013.

Collobert, et al. “A unified architecture for natural language processing: deep neural networks with multitask learning.,” Proceedings of the 25th international conference on machine learning. ICML '08. pp. 160-167., 2008.

Avilov, et al., "Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals,". 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). Montreal, QC, Canada: IEEE. 2020: 142-145.

Tsantekidis, et al. "Forecasting Stock Prices from the Limit Order Book Using Convolutional Neural Networks,". 2017 IEEE 19th Conference on Business Informatics (CBI). Thessaloniki, Greece: IEEE: 7-12.

K. Fukushima, "Neocognitron.,” Scholarpedia. 2 (1): 1717., 2007.

Fukushima, Kunihiko, "Neocognitron: “A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position,". Biological Cybernetics. 36 (4): 193-202., 1980.

Matusugu, et al., "Subject independent facial expression recognition with robust face detection using a convolutional neural network," Neural Networks. 16 (5): 555-559., 2003.

Mcfee, et al, "librosa: Audio and music signal analysis in python.," In Proceedings of the 14th python in science conference, pp. 18-25. 2015. https://zenodo.org/badge/latestdoi/6309729

F. Chollet, et al., ”keras. gitHub.,” Retrieved from https://github.com/fchollet/keras.,2015.

F. Pedregosa, et al.,”scikit-learn: machine learning in Python. Journal of Machine Learning Research, 12(Oct), 2825-2830., 2011.

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, R. Jozefowicz, Y. Jia, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, M. Schuster, R. Monga, S. Moore, D. Murray, C. Olah, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng., “TensorFlow: Large-scale machine learning on heterogeneous systems,” 2015. software available from tensorflow.org. https://doi.org/10.5281/zenodo.4724125

J. D. Hunter, "Matplotlib: A 2D Graphics Environment", Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007. https://doi.org/10.5281/zenodo.592536

Harris, C.R., Millman, K.J., Van Der Walt, S.J. Et Al. array programming with NumPy. Na-ture 585, 357-362 (2020). DOI: 10.1038/s41586-020-2649-2.

Mckinney, “Data structures for statistical computing in python,” Proceedings of the 9th Python in Science Conference, Volume 445, 2010. Pandas. https://doi.org/10.5281/zenodo.3509134

J. Shrey, “Best of Watkins Marine Mammal Sound Database,” Retrieved from https://www.kaggle.com/datasets/shreyj1729/best-of-watkins-marine-mammal-sound-database

M Picciulin, M. Bolgan, N Rako-Gospić, A Petrizzo, M. Radulović, R. Falkner,”A Fish and Dolphin Biophony in the Boat Noise-Dominated Soundscape of the Cres-Lošinj Archipelago (Croatia). journal of Marine Science and Engineering, 10(2), p.300., 2022.

M. Picciulin, E Armelloni, R Falkner, N Rako-Gospić, , M. Radulović, G. Pleslić, S. Muslim, H. Mihanović, T. Gaggero,”characterization of the underwater noise produced by recreational and small fishing boats (< 14 m) in the shallow-water of the Cres-Lošinj,” Natura 2000 SCI. Marine Pollution Bulletin, 183, p.114050., 2022.

A.R. Luís, et al, “Vocal universals and geographic variations in the acoustic repertoire of the common bottlenose dolphin.,” Scientific reports, 11(1), pp.1-9., 2021.

N. Gospić, M Picciulin,” Changes in whistle structure of resident bottlenose dolphins in relation to underwater noise and boat traffic. Marine Pollution Bulletin 105, 193-8., 2016.

N. Rako-Gospić, N. Radulović, M. Vučur, T. Pleslić, G. Holcer, P. Mackelworth, “Factor associated variations in the home range of a resident Adriatic common bottlenose dolphin population.,” Marine Pollution Bulletin. 2017.

Rako N., et al., “Leisure boating noise as a trigger for the displacement of the bottlenose dolphins of the Cres-Lošinj archipelago (northern Adriatic Sea, Croatia) Marine Pollution Bulletin 68, 77-84., 2013.

H. Vishnu, H., M. Hoffmann-Kuhnt, M. Chitre, A. Ho, E. Matrai,”A dolphin-inspired compact sonar for underwater acoustic imaging.,” communications engineering 1, 10 https://doi.org/10.1038/s44172-022-00010-x

GitHub: Analyze and classify sounds with AI. Retrieved from Github: https://github.com/ovh/ai-training-examples/blob/main/notebooks/audio/audio-classification/notebook-marine-sound-classification.ipynb

GitHub: A. Aradi, SoundScape and Dolphins Retrieved from Github: https://github.com/capnA2XY/SoundScape_and_Dolphins/tree/main