Arbitrary Font Generation by Encoder Learning of Disentangled Features

Sensors (Basel). 2022 Mar 19;22(6):2374. doi: 10.3390/s22062374.

Abstract

Making a new font requires graphical designs for all base characters, and this designing process consumes lots of time and human resources. Especially for languages including a large number of combinations of consonants and vowels, it is a heavy burden to design all such combinations independently. Automatic font generation methods have been proposed to reduce this labor-intensive design problem. Most of the methods are GAN-based approaches, and they are limited to generate the trained fonts. In some previous methods, they used two encoders, one for content, the other for style, but their disentanglement of content and style is not sufficiently effective in generating arbitrary fonts. Arbitrary font generation is a challenging task because learning text and font design separately from given font images is very difficult, where the font images have both text content and font style in each image. In this paper, we propose a new automatic font generation method to solve this disentanglement problem. First, we use two stacked inputs, i.e., images with the same text but different font style as content input and images with the same font style but different text as style input. Second, we propose new consistency losses that force any combination of encoded features of the stacked inputs to have the same values. In our experiments, we proved that our method can extract consistent features of text contents and font styles by separating content and style encoders and this works well for generating unseen font design from a small number of reference font images that are human-designed. Comparing to the previous methods, the font designs generated with our method showed better quality both qualitatively and quantitatively than those with the previous methods for Korean, Chinese, and English characters. e.g., 17.84 lower FID in unseen font compared to other methods.

Keywords: arbitrary font generation; consistency loss; feature disentanglement; hallucinated input; stacked input.

MeSH terms

  • Humans
  • Language*
  • Learning*