Visual perception of liquids: Insights from deep neural networks

Jan Jaap R van Assen; Shin'ya Nishida; Roland W Fleming

doi:10.1371/journal.pcbi.1008018

Visual perception of liquids: Insights from deep neural networks

PLoS Comput Biol. 2020 Aug 19;16(8):e1008018. doi: 10.1371/journal.pcbi.1008018. eCollection 2020 Aug.

Authors

Jan Jaap R van Assen¹, Shin'ya Nishida^{1

2}, Roland W Fleming^{3

4}

Affiliations

¹ Human Information Science Laboratory, NTT Communication Science Laboratories, Nippon Telegraph and Telephone Corporation, Atsugi, Kanagawa, Japan.
² Graduate School of Informatics, Kyoto University, Yoshida-honmachi, Sakyo-ku, Kyoto, Japan.
³ Department of Experimental Psychology, University of Giessen, Giessen, Hessen, Germany.
⁴ Centre for Mind, Brain and Behaviour (CMBB), University of Marburg and Justus Liebig University, Giessen, Germany.

Abstract

Visually inferring material properties is crucial for many tasks, yet poses significant computational challenges for biological vision. Liquids and gels are particularly challenging due to their extreme variability and complex behaviour. We reasoned that measuring and modelling viscosity perception is a useful case study for identifying general principles of complex visual inferences. In recent years, artificial Deep Neural Networks (DNNs) have yielded breakthroughs in challenging real-world vision tasks. However, to model human vision, the emphasis lies not on best possible performance, but on mimicking the specific pattern of successes and errors humans make. We trained a DNN to estimate the viscosity of liquids using 100.000 simulations depicting liquids with sixteen different viscosities interacting in ten different scenes (stirring, pouring, splashing, etc). We find that a shallow feedforward network trained for only 30 epochs predicts mean observer performance better than most individual observers. This is the first successful image-computable model of human viscosity perception. Further training improved accuracy, but predicted human perception less well. We analysed the network's features using representational similarity analysis (RSA) and a range of image descriptors (e.g. optic flow, colour saturation, GIST). This revealed clusters of units sensitive to specific classes of feature. We also find a distinct population of units that are poorly explained by hand-engineered features, but which are particularly important both for physical viscosity estimation, and for the specific pattern of human responses. The final layers represent many distinct stimulus characteristics-not just viscosity, which the network was trained on. Retraining the fully-connected layer with a reduced number of units achieves practically identical performance, but results in representations focused on viscosity, suggesting that network capacity is a crucial parameter determining whether artificial or biological neural networks use distributed vs. localized representations.

Publication types

Research Support, Non-U.S. Gov't

MeSH terms

Adult
Computational Biology
Female
Humans
Male
Models, Neurological*
Neural Networks, Computer*
Viscosity*
Visual Perception / physiology*
Young Adult

Grants and funding

JJRvA was funded by the European Research Council (ERC) Consolidator Award ‘SHAPE’–project number ERC-CoG-2015-682859 (https://erc.europa.eu), by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP15H05915 (https://www.jsps.go.jp/english/), and a Google Faculty Research Award (https://ai.google/research/outreach/faculty-research-awards/). SN was funded by Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Numbers JP15H05915 and 20H00603. RWF was funded by Deutsche Forschungsgemeinschaft (DFG, German Research Foundation, https://www.dfg.de/en/)–project number 222641018–SFB/ TRR 135 TP C1, by the European Research Council (ERC) Consolidator Award ‘SHAPE’–project number ERC-CoG-2015-682859, and a Google Faculty Research Award. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.