ELSE Corp’s definition of Style for AI and 3D Visual Search
note: provided below is a simplified “public ready” description of the activity, managed by ELSE Corp and its child company ELSE Tech, on the research direction defined internally as of 3D VISUAL SEARCH, part of ELSE.ai framework.
Else Corp’s Visual Product Recommendation system works based on visual similarity and is applicable at the moment to the fashion products such as shoes (main expertise and market of the company’s core business).
To define the concept of Product Similarity for machine in a way that it would be understandable/acceptable by humans, we should consider human perception on visual similarity. For this reason, we introduce an internal definition of Product Style and try to make the AI understand our (human/experts defined) definition of Style, also from the machine point of view. Computationally modeling aspects of human visual system lead us to define a vector for style and call It Style-Vector. Based on human perception and visual system, one of the Important parameters on recognizing any object, Is Shape. Humans can define so many things even in somehow dark environments with less presence of color. we also value shape as it is valued by human perception. we will have some number of categories that either are quite different on shape or include sort of components which separate them as equal to shape difference. Deep Learning is a hot topic, high potential and also in many cases very sufficient no doubt ,BUT it might not be the best solution for small data environments, like the CREATIVE INDUSTRIES- Fashion & Design. Deep Learning can be a very good companion to more small data related solutions but we believe it surely can not be enough to implement a deep net or even transfer learning and expect it to understand the style as we understand it, driven by the engineering approach to product design and based on CAD and PDM data.
Fine grained way of analysis and feature extraction increases the discriminative power of our AI when facing shoes, which do not differ too much, also helps the system better cluster the computed parameters like SIFT features. after all style is not a crisp model and we will be using fuzzy logic in determining the style similarity. for that reason, to make a fuzzy membership area around each defined style (profile style) and compare how similar any given style vector is to this profile, we can add a tag for Brand-based attributes. This parameter will be reinforced by human experts. we all know that there might be more similarity between an office clothes production brand with other elegant brand than a super sport city running. This parameter will be affected by decision of field experts, as they will decide on clustering of Brand Similarity groups.
Our main challenge is to build a connection between Human Perception and Machine Language at one point. Computing local and global descriptor and creating BoW followed by an Adoptive Dictionary Is our solution. As mentioned above, shape Is Important but Its not the only global descriptor, color Is a very Important measure both for shoe description and also for recommendation. we will be using color histograms to compute the general presence of RGB colors In 3 different histograms, one for each channel. later concatenation of these 3 histograms will give us a RGB histogram which can be compared to any other histogram of same type. the comparison method Is Histogram Intersection Kernel (HIK). this color parameter Is considered computational because It might mean nothing to human eyes as It Is a histogram, but It surely means a comparable factor to machine. The value of HIK later can be used for recommendation In Content Based Image Retrieval (CBIR) model. for local descriptors we can use SIFT/SUFT features or in a larger scale VLAD to create the dictionary.There are some parameters which are assigned to style vector to further customize the desired attributes of customer like main color and material. we have some parameters outside of style vector which we call It detection/recognitional attributes that will affect fuzzifying of assigned weights to the most similar previous experiences of customer. these attributes can be: logo, specific components, specific patterns and ….
Defining a style vector, lets us to have a sort of universal language in presenting any given shoe. This vector is not only computed by AI which is not understandable by humans and neither only declarative which is not understandable by machine. It is actually both and part to connect these to each other.
Being able to represent any given shoe, either from the gallery or input image by customer, will make a solid ground for style based recommendation systems. fuzzy clustering of shoe styles and then computing a membership function value for a new input image will give us the percentage of similarity of that shoe to all our style groups.
more info is coming soon… stay tuned!
Further reading- see how it works for the RETAIL SCENARIOS: