Our technology works based on a continuous psychological model that recognizes different emotion states in a wide spectrum of acoustic emotions. This allows better representation and provides more flexibility in identifying properties in the voice. The model defines emotions by regions in a three-dimensional continuous space, where each dimension represents an emotion primitive shown by all human emotions.: Valence, Dominance and Activation.
This approach holds great potential to model the occurrence of emotions in the real world. In a realistic scenario, emotions are not generated in a prototypical or pure modality. They are often complex emotional states, which are a mixture of emotions with varying degrees of intensity or expressiveness. We use a method for emotional states representation based on statistical models and fuzzy clustering in order to extract more comprehensive and descriptive information. We model the relationship between acoustic features and emotion primitives using a statistical approach based on acoustic information. The interpretation of emotion primitives and its relationship to emotional states is done using fuzzy logic models, modeling the emotion phenomena. We have tested about 7,000 acoustic features including prosody, voice quality and spectral information.