LogicNet: A Logical Consistency Embedded Face Attribute Learning Network

Haiyu Wu\(^{1}\), Sicong Tian\(^{2}\), Huayu Li\(^3\), Kevin W. Bowyer\(^{1}\)
\(^{1}\)University of Notre Dame
\(^2\)Indiana University South Bend
\(^3\)University of Arizona


Abstract

Ensuring logical consistency in predictions is a crucial yet overlooked aspect in multi-attribute classification. We explore the potential reasons for this oversight and introduce two pressing challenges to the field: 1) How can we ensure that a model, when trained with data checked for logical consistency, yields predictions that are logically consistent? 2) How can we achieve the same with data that hasn’t undergone logical consistency checks? Minimizing manual effort is also essential for enhancing automation. To address these challenges, we introduce two datasets, FH41K and CelebA-logic, and propose LogicNet, an adversarial training framework that learns the logical relationships between attributes. Accuracy of LogicNet surpasses that of the next-best approach by 23.05%, 9.96%, and 1.71% on FH37K, FH41K, and CelebA-logic, respectively. In real-world case analysis, our approach can achieve a reduction of more than 50% in the average number of failed cases compared to other methods.

1 Introduction↩︎

Where does the logical consistency problem arise in computer vision? In the realm of multi-attribute classification, models are trained to predict the attributes represented in a given image. Examples include facial attributes [1][3], clothing styles [4], [5], animal attributes [6], human action recognition [7], and others. Whenever multiple attributes are predicted for an image, logical relationships may potentially exist among these attributes. For instance, in a popular scheme for predicting attributes in face images [1], the attributes goatee, no beard, and mustache are predicted independently. Logically, however, if no beard is predicted as true, then both goatee and mustache should be predicted as false. Note that, in CelebA, mustache is a type of beard based on the ground truth. Logical consistency issues may arise in more subtle interactions as well. For example, if wearing hat is predicted as true, then the information required to make a prediction for bald is occluded. Similarly, if wearing sunglasses is predicted as true, then the information to predict narrow eyes or eyes closed is occluded. Analogous logical relationships exist in the clothing style dataset; long-sleeve and sleeveless cannot both be predicted as true for a single garment. If floral is predicted as false, then floral print cannot logically be predicted as true, since it is a specific type of floral design. It is evident that issues of logical consistency emerge in computer vision, particularly in the area of multi-attribute classification.

Figure 1: Which face attribute classifier would you integrate into your system? Raw average accuracy of two facial attribute classifiers is given above in Orange. When logical consistency across attributes is taken into account, effective accuracy (in Purple) is reduced. Higher raw accuracy does not necessarily translate to higher effective accuracy.

Why is logical consistency on predictions important? Face and body attributes have been extensively utilized in various research domains, including face matching/recognition [2], [8][12], re-identification [13][15], training GANs [16][19], bias analysis [3], [20][23], and others. For a fair accuracy comparison across demographic groups, it is pivotal to balance the distribution of non-demographic attributes among the groups [23]. To train a face editing GAN, it is necessary to classify training images based on that attribute. However, if images exhibit logically inconsistent sets of attribute values, these applications of the attributes become problematic and prone to errors. For example, if a group wants to understand how facial hair affects the face recognition accuracy across demographic groups, they have to tightly control variation on facial hair. However, if a model predicts {clean-shaven and beard-length-short} or {beard-at-chin-area and full-beard} for the same image, this type of predictions will put same images in two conflicting categories and significantly impact the statistical observation. Hence, logical consistency of attribute predictions is crucial for essentially all higher-level computer vision tasks.

Why has logical consistency not received more attention? 1) Higher complexity and cost for considering the logical relationship during attribute marking. Labeling training images with attribute values is already a labor-intensive task. Requiring the manually-assigned meta-data to be logically consistent may make the problem worse. 2) Predominant focus on algorithmic accuracy over ground truth accuracy: Researchers often prioritize achieving accuracy improvements on established benchmarks, which is commendable. However, as accuracy levels approach a plateau, there may be a misconception that the problem has been resolved, whereas the plateau might merely reflect the level of (in)consistency in the attribute values within the training data. 3) Ambiguity of attribute names. CelebA is a notable face attribute dataset, but  [24], [25] report that ambiguous attributes are a big problem; that is, attributes such as “high cheekbones”, “pointy nose”, “oval face”, etc. This is a problem not only of CelebA but of all face attribute datasets that use similar attributes. The ambiguity hinders logical consistency research since it is hard to find strong logical relationships between two ambiguous attributes. Consequently, none of the recent survey papers [26][29] mentions this crucial topic.

This paper introduces two challenging tasks to the domain of multi-attribute classification: (1) Training a model with labels that have been checked for logical consistency, aiming to improve the accuracy and logical consistency of predictions without involving post-processing steps; and (2) Training a model without labels that have been checked for logical consistency, also aiming to improve the accuracy and logical consistency of predictions without involving post-processing steps. The contributions of this work include:

  • Provide an explanation of why logical consistency on predictions is a crucial but overlooked topic, and two challenging tasks.

  • Provide a larger benchmark, FH41K, with more samples and better balance across attributes, to better evaluate the performance for facial hair attribute classification.

  • Provide a set of logical relationship cleaned annotations for CelebA validation and testing sets to support a more challenging task: train a logically consistent model with logical consistency unchecked data.

  • Propose an adversarial training method, LogicNet, to achieve higher accuracy and lower logical inconsistency across three datasets.

2 Related Work↩︎

In the NLP domain, logical reasoning is a crucial topic and a detailed discussion appears in a recent survey [30]. There are various types of logical-reasoning-oriented benchmarks [31][35] for researchers to dig out the rations in order to improve the logical consistency of the results.

In the Computer Vision domain, a myriad of attribute relationships have been leveraged to enhance performance. These encompass positional relationships [36], [37], correlational relationships [38][40], logical relationships [3], [41], etc. Such relationships facilitate a deeper understanding and processing of visual data, thereby contributing to the advancement of the field. However, to our best knowledge, except [3], none of the previous works considered the logical consistency of the predictions.  [3] proposed a Logical Consistency Prediction loss (LCPloss) in order to leverage the logical relationship between attributes and maintain the logical predictions. Tables 2 and 3 of this work indicate that, after considering the logical consistency on predictions, the accuracy drops significantly. Although the proposed post-processing step, label compensation (LC) strategy, reduces a large number of logically consistent predictions, it is not a general solution and needs intensive manual work to achieve a proper design. Moreover, since the existing multi-attribute classification datasets did not consider the logical relationship when they were assembled, manually cleaning them needs a large cost, so how to force the model, which is trained with unclean dataset, making accurate and logical predictions is a crucial problem.

This paper reports on the first general method, LogicNet, for causing the learned model to make logical predictions. This work also providse a benchmark for understanding and designing approaches in order to further research on the problem of logical consistency of predictions.

3 Benchmarks↩︎

Figure 2: Logical relationship between attributes in CelebA. Strong: Impossible in most cases. Weak: Rarely Possible in some cases. Independent/Ambiguous: The attributes are either ambiguous on definitions [24], [25] (e.g. Attractive, High Cheekbones.) or independent from the other attributes (e.g. Mouth Slightly Open, Sideburns.). 5 O’ S and MSO means 5 O’clock Shadow and Mouth Slightly Open.

The ambiguity of attribute names and the reasons listed in Section 1 result in a lack of datasets that are appropriate to evaluate the model performance on the dimension of logical consistency.

FH37K is the first dataset checking both logical consistency and accuracy of the annotations. It contains 37,565 images, coming from a subset of CelebA [1] and a subset of WebFace260M [42]. Each image has 22 attributes of facial hair and baldness. However, due to the small amount of positive samples of attribute "Long" and attribute "Bald Sides Only", insufficient train/val/test samples is limitation of the FH37K dataset. To address this, we augment this dataset by adding more positive samples to minority classes. Note that FH37K is still a benchmark dataset in this paper.

FH41K is our extended dataset based on FH37K. We added 3,712 images from 2,096 identities from WebFace260M [42], specifically to increase the number of positive examples of attributes that had too low of a representation in FH37K. Specifically, we used the best facial hair classification model trained with FH37K1 to select the images that have confidence value higher than 0.8 for both "Long" and for "Bald Sides Only". We then engaged a human annotator, with the prior knowledge learnt from the documentation provided by  [3], to manually check the selected images in order to promise the accuracy and logical consistency of the added images.

Both FH37K and FH41K have a set of rigorously defined rules based on the logical relationships including mutually exclusive, dependency, and collectively exhaustive. The annotations are evaluated based on these relationships. However, generating training sets that have accurate and logically consistent sets of attribute labels is an expensive and time-consuming process. Previous existing datasets were created without considering the issue of logical consistency annotations. This raises an important question. Is it possible to train a model to produce logically consistent attribute predictions using a training dataset that does not have logically consistent annotations? We compiled an additional dataset specifically for studying this question.

CelebA-logic is the variation of CelebA, where the logical relationship between attributes is checked for both validation and test sets. Given the absence of a definitive guide of how these 40 attributes are marked and what the definition of each attribute, we categorized the attribute relationships into three groups based on our knowledge, as shown in Figure 2. To make a fair set of logical rules, only Strong relationships are used to check the logical consistency. Moreover,  [24], [25] reported that CelebA suffers from a substantial rate of inaccurate annotations. Hence, we conducted an annotation cleaning process for those strong relationship attributes upon the MSO-cleaned-annotations [24]. To get the cleaned facial hair and baldness related attributes, we converted the FH37K annotations back to the CelebA version and updated the labels to the corresponding images. Two human annotators then marked “Bangs”, “Receding Hairline” and “Male” based on the designed definitions for all the images in the validation and test sets. To ensure the consistency and accuracy of the new annotations, a third human annotator with knowledge of definitions marked 1,000 randomly selected samples. The estimated consistency rate is 93.87%. Consequently, 1) all images are cropped and aligned to 224x224 based on the given landmarks, 2) 975 images are omitted from the original dataset, 3) 63,557 (31.8%) images have at least one different label than the original, and 4) all test and validation annotations obey the Strong logical relationships.

4 Proposed Method↩︎

To provide a solution for the challenges of logical consistency, we propose LogicNet, which exploits an adversarial training strategy and a label generation algorithm, Bag of Labels (BoL). LogicNet enables the classifier to learn the logical relationship between attributes, thereby enhancing the model’s capacity to generate logically consistent predictions.

Figure 3: The proposed LogicNet. The weights of multi-attribute classifier and discriminator are updated alternatively. \(L'\) is either the predictions of the classifier or the poisoned labels from BoL algorithm. \(L_{logic}\) is the logical consistency label vector.

4.1 Adversarial Training↩︎

We propose an adversarial training framework, shown in Figure 3, to compel the classifier \(\mathcal{C}\) to make logically consistent predictions while improving the accuracy of predictions. Formalizing the desired goal, we consider a set of training images as \(X\in \{x_1, x_2,..., x_N\}\), from which we want to train a model, \(\mathcal{F}(X)\), to project \(X\) to the ground truth labels \(L_{gt}\in\{l_1, l_2,...,l_N\}\), where each \(l_N\) is a set of attribute labels of \(x_N\). The classification loss is the binary cross entropy loss: \[\begin{align} \mathcal{L}_{bce}(\mathcal{F}(X;\Phi), L_{gt}) = &-\frac{1}{N}\sum^{N}_{i=1}[l_{i}log(\mathcal{F}(x_i;\Phi))\\&+ (1-l_{i})log(1-\mathcal{F}(x_i;\Phi))] \end{align}\] Where \(\Phi\) is the parameter vector of the classifier \(\mathcal{C}\). For the adversarial learning, a discriminator that can judge the logical consistency of the predictions is needed. Here, we use a simple and effective multi-headed self-attention network to give a probability, \(\mathcal{P}_{logic}\in [0, 1]\), for the logical consistency of labels, \(L'\). The loss of the multi-attribute classifier, \(\mathcal{L}_C\), becomes: \[\underset{\Phi}{\min} \underset{\Theta}{\max}(1-\lambda)\mathcal{L}_{bce}(\mathcal{F}(X;\Phi), L_{gt})+\lambda log(-\mathcal{D}(\mathcal{F}(L');\Theta))\] Where \(\mathcal{D}\) is the parameter frozen discriminator, \(\Theta\) is the parameter vector of the discriminator, and \(\lambda\) is used for loss trade-off.

4.2 Bag of Labels↩︎

To train a discriminator, the straightforward approach [40] is to directly feed the predictions (ground truth labels) and treat them as negative (positive) samples. Since the training labels of CelebA are not yet cleaned, using them could mislead the discriminator and cause it to learn incorrect patterns. Hence, we propose a Bag of Labels algorithm 4 that can automatically generate logically inconsistent labels based on the given ground truth labels while detecting the logical consistency of the original label. This algorithm is used in two parts of the LogicNet approach: Condition Group Setup and Label Poisoning.

Condition Group Setup: To give accurate logic labels \(L_{logic}\) to \(L_{gt}\), following the rules, we separate the corresponding attributes of each rule to two groups: \(g_{c1}\) and \(g_{c2}\), where the attributes in \(g_{c1}[i]\) have strong logical relationships with the attributes in \(g_{c2}[i]\). For both FH37K and FH41K, we followed the rules given by [3]. For CelebA, we followed the rules in Figure 2.

Label Poisoning: To generate logically inconsistent labels, we first categorize the rules in three cases: inter-class impossible poisoning, intra-class impossible poisoning, and intra-class incomplete poisoning. Inter-class impossible poisoning aims to generate labels where the logical inconsistency happens between the attributes in different classes (e.g. Beard Area(clean shaven)=true and Beard Length(short)=true; no beard=true and goatee=true). Intra-class impossible and intra-class incomplete poisoning aim to generate labels where there are multiple positive predictions within one class (e.g. Beard Area(clean shaven)=true and Beard Area(chin area)=true) or no positive predictions within on class. These two poisoning strategies apply to FH37K and FH41K; attributes in CelebA do not have this level of detail and so do not have these logical relationships. After each poisoning, the initialized logic labels, \(L_{logic}\), are updated on-the-fly. The objective function is: \[\underset{\Theta}{\min}\mathcal{L}_{\mathcal{D}} = \mathcal{L}_{bce}(L_{logic}, \mathcal{D}(L'))\] Where \[L' = \left\{\begin{matrix} N_{random} > 0.5, & L_{bol}\\ Others, & L_{pred} \end{matrix}\right.\] Here, \(L_{bol}\) is from BoL algorithm, \(L_{pred}\) is from classifier, \(N_{random}\) is a randomly generated float number between 0 and 1.

Figure 4: Bag of Labels

Table 1: Accuracy of models trained with different methods on FH37K (left) and FH41K (right) dataset. \(\dagger\) means the measurements consider the logical consistency. [Keys: Best, \(<\)​60%, Second best].
Methods \(Acc_{avg}\) \(Acc^{n}_{avg}\) \(Acc^{p}_{avg}\) \(Acc_{avg}\) \(Acc^{n}_{avg}\) \(Acc^{p}_{avg}\)
Logical consistency is not taken into account...
BCE 79.23 94.72 63.73 83.88 95.50 72.27
BCE-MOON 86.21 90.67 81.75 88.02 91.29 84.75
BF 76.92 95.43 58.41 75.81 97.78 52.85
BCE+LCP 79.64 95.98 63.30 84.93 95.09 74.77
Ours 83.65 93.46 73.83 85.66 94.23 77.10
W/ label compensation...
BCE\(^\dagger\) 80.14 91.49 68.78 79.12 87.43 70.81
BCE-MOON\(^\dagger\) 42.59 50.55 34.62 42.79 47.96 37.61
BF\(^\dagger\) 78.48 90.91 66.05 82.85 93.53 73.17
BCE+LCP\(^\dagger\) 81.44 92.65 70.23 79.31 87.31 71.31
Ours\(^\dagger\) 78.28 87.23 69.32 81.53 89.10 73.96
W/o label compensation (what we care!)...
BCE\(^\dagger\) 48.50 54.59 42.40 56.71 62.14 51.27
BCE-MOON\(^\dagger\) 40.25 47.54 32.95 40.68 45.39 35.98
BF\(^\dagger\) 36.20 40.95 31.45 22.38 23.84 20.92
BCE+LCP\(^\dagger\) 38.69 43.70 33.67 64.54 70.40 58.67
Ours\(^\dagger\) 71.55 79.37 63.73 74.50 81.41 67.59
Table 2: Accuracy of models trained with different methods on CelebA-logic dataset. "original" means the model is tested with the same images but using the original annotations. [Keys: Best, Second best]
*Methods W/o considering logical consistency Considering logical consistency (What we care!)
2-7 \(Acc_{avg}\) \(Acc^{n}_{avg}\) \(Acc^{p}_{avg}\) \(Acc_{avg}\) \(Acc^{n}_{avg}\) \(Acc^{p}_{avg}\)
AFFACT (original) 81.25 95.72 66.78 79.11 93.55 64.67
ALM (original) 81.97 94.25 69.69 79.04 91.04 67.03
AFFACT 79.71 95.48 63.95 77.72 93.31 62.12
ALM 80.53 94.10 66.95 77.63 90.88 64.39
BCE 80.89 94.96 66.70 77.94 92.34 63.54
BCE-MOON 87.13 87.95 86.32 74.76 76.24 73.28
BF 76.44 96.77 56.11 75.28 95.82 54.75
BCE+LCP 81.91 94.16 69.66 78.07 90.26 65.87
Ours 82.18 93.74 70.63 79.08 90.89 67.28

5 Experiments↩︎

In this section, we evaluate the proposed approach from two aspects: accuracy and logical consistency. For accuracy, the traditional average accuracy measurement (Eq. 1 , where \(N\) = total number of images, \(N_{tp}\) = number of true positive predictions, \(N_{tn}\) = number of true negative predictions) ignores the unbalanced number of positive and negative images for each attribute. \[AccT_{avg} = \frac{1}{N}(N_{tp} + N_{tn})\times100 \label{eq:traditional}\tag{1}\] This results in an unfair measure of model performances since the multi-attribute classification datasets suffer from sparse annotations. For example, in the original CelebA annotations, if all predictions are negative, the overall test accuracy is 76.87%. Hence, we follow the suggestion in [43] and use average value of the positive accuracy, \(Acc^{p}_{avg}\), and negative accuracy, \(Acc^{n}_{avg}\), to consider the imbalanced issue. The equation is as: \[Acc_{avg} = \frac{1}{2}(Acc^{p}_{avg}+Acc^{n}_{avg})\] In addition, to show how logical consistency of predictions affects the accuracy, we measure the performance under two conditions: 1) without considering the logical consistency on predictions, 2) with considering the logical consistency on predictions, in this case, logically inconsistent predictions are deemed incorrect. For FH37K and FH41K, we also include the label compensation strategy [3] experiments to complete the accuracy comparison. To measure the model performance on logical consistency, we performed logical consistency checking on the predictions of 600K images from WebFace260M [42]. We also independently compare the accuracy values on the strong relationship attributes in CelebA-logic.

To give a comprehensive study of the lack of consideration on logical consistency when the models give predictions, we choose four training methods to do the comparisons. Binary Cross Entropy Loss (BCE) is a baseline which only considers the entropy between predictions and ground truth labels. Binary Focal Loss (BF) [44] aims to focus more on hard samples in order to mitigate the effect of imbalanced data. BCE-MOON [45] calculates the ratio of positive and negative samples for each attribute as the weights added to loss values before back propagation. It tries to balance the effect of positive and negative samples. Logically Consistent Prediction Loss (BCE+LCP) [3] utilizes the conditional probability to force the probability of the mutually exclusive attributes happen at the same time being 0 and the probability of the dependency attributes happen at the same time being 1.

5.1 Implementation↩︎

We train all the classifiers starting with the pretrained ResNet50 [46] from Pytorch2. The FH37K results in Table 1 are adopted from [3] except the values of \(Acc_{avg}\). We resize images to 224x224 for all three datasets. The batch size and learning rate are {256, 0.0001} for FH37K and FH41K, and {64, 0.001} for CelebA-logic. We use random horizontal flip for both FH37K and FH41K. We use random horizontal flip, color jitter, and random rotation for CelebA-logic. AFFACT [47] and ALM [48] are the two SOTA models that are available online, which we used for performance comparison on CelebA-logic. The \(\lambda\) values for FH37K, FH41K, and CelebA are \(\{0.15, 0.2, 0.1\}\). The discriminator consists with 8 multi-headed self-attention blocks and there is no position embedding implemented. It is necessary to know that ALM algorithm resizes the original (178x218) CelebA images to 128x128 for testing, the other methods are using the cropped images mentioned in Section 3 for testing.

Table 3: Accuracy values of the attributes that have strong logical relationships in CelebA-logic with considering the logical consistency. 5 O’ S, \(^*\)Hairline, and \(^*\)Hat are 5 O’clock Shadow, Receding Hairline and Wearing Hat. [Keys: Best, Second best]
Methods 5 O’ S Bald Bangs Goatee Male Mustache No_Beard \(^*\)Hairline \(^*\)Hat \(Acc_{avg}\)
AFFACT 72.24 90.27 85.36 70.41 95.51 61.86 85.47 65.69 93.93 80.08
ALM 76.34 81.81 85.08 74.27 93.88 64.51 87.66 62.84 90.62 79.67
BCE 65.88 75.68 86.69 76.73 94.78 87.80 80.68 67.28 89.96 80.61
BCE-MOON 69.79 70.77 82.22 80.73 82.09 82.73 77.40 59.67 84.20 76.62
BF 59.38 78.05 78.52 69.33 96.82 87.59 83.12 62.88 92.13 78.65
BCE+LCP 69.83 82.91 84.00 75.33 92.72 88.54 81.30 63.57 89.70 80.88
Ours 68.10 86.03 87.79 79.05 94.54 89.40 81.45 65.55 91.39 82.59
Table 4: Logical consistency test on predictions. The models are trained with FH37K (left) and FH41K (right). \(N_{incomp}\), \(N_{imp}\), and \(R_{failed}\) are the number of incomplete predictions, the number of impossible predictions, and failed ratio. [Keys: Best, \(>50\%\)]
Methods \(N_{incomp}\) \(N_{imp}\) \(R_{failed}\) \(N_{incomp}\) \(N_{imp}\) \(R_{failed}\)
W/ label compensation...
BCE 0 11,134 1.84 0 7,464 1.24
BCE-MOON 0 330,115 54.66 0 341,114 56.48
BF 0 14,007 2.32 0 3,530 0.58
BCE+LCP 0 5,595 0.93 0 5,788 0.96
Ours 0 21,731 3.60 0 19,194 3.18
W/o label compensation (what we care!)...
BCE 240,761 6,001 40.86 352,061 585 58.39
BCE-MOON 31,512 313,044 57.05 34,415 321,872 59.00
BF 339,136 1,295 56.37 587,056 0 97.21
BCE+LCP 307,576 300 50.98 248,768 2,416 41.59
Ours 139,184 14,660 25.47 133,245 13,838 24.36

5.2 Accuracy↩︎

Table 1 and Table 2 show the accuracy values, tested on FH37K, FH41K, and CelebA-logic, under two measurement conditions. In the traditional case of not considering logical consistency of predictions, every method reaches \(>75\%\) average accuracy, where BCE-MOON is {2.56%, 2.36%, 4.95%} higher than the next-highest accuracy on {FH37K, FH41K, CelebA-logic}. The main reason is that BCE-MOON has outstanding performance on positive label prediction, which is {7.92%, 7.65%, 15.69%} higher than the second highest accuracy on {FH37K, FH41K, CelebA-logic}. However, with considering the logical consistency, BCE-MOON has a significant accuracy decrease, {45.96%, 47.43%, 12.37%}, on the three datasets. Note that the accuracy decrease happens across all training methods.

For FH37K and FH41K, except for the proposed method, the average decreases in accuracy are 39.59% and 37.03% respectively. Seven out of eight results have \(<60\%\) accuracy and the lowest accuracies are only 36.2% and 22.38%. These results show how much the traditional methods suffer from predicting logically inconsistent labels. Note that these methods aim to solve different problems in multi-attribute classification. The proposed method has {12.1%, 11.16%} decreasing on accuracy and the overall accuracy is {23.05%, 9.96%} higher than the second highest accuracy, {35.35%, 52.12%} higher than the lowest accuracy.

 [3] proposed a post-processing step, termed label compensation strategy, to resolve incomplete predictions. By using this strategy, the methods beside except BCE-MOON have a significant, 30.85% on average, increase in accuracy. This results in two conclusions: 1) Methods that aim to mitigate the imbalanced data effect might give an illusion of high accuracy driven by positive predictions; 2) Other methods can somewhat catch the logical patterns, but need to involve post-processing steps. However, the label compensation strategy is only for solving the collectively exhaustive case (i.e. the model must give one positive prediction in an attribute group). For example, in FH37K and FH41K, the attributes, {clean-shaven, chin-area, side-to-side, beard-area-information-not-visible}, in the Beard Area group can cover any case that is related to beard area. Implementing this type of strategies necessitates extensive manual analysis to determine the most judicious decision-making process, underscoring the imperative for continued research in this domain.

Table 5: Ablation study for training a logic discriminator resulting in the accuracy of the classifier. The testing sets are FH37K (Top), FH41K (Middle), CelebA-logic (Bottom). "preds" and "BoL" represent classifier predictions and poisoned labels. [Keys: Best]
*Methods W/o considering logical consistency Considering logical consistency (What we care!)
2-7 \(Acc_{avg}\) \(Acc^{n}_{avg}\) \(Acc^{p}_{avg}\) \(Acc_{avg}\) \(Acc^{n}_{avg}\) \(Acc^{p}_{avg}\)
LogicNet (preds) 82.63 93.42 71.83 65.94 74.18 57.70
LogicNet (BoL) 81.90 93.04 70.77 65.04 73.23 56.86
LogicNet (preds + BoL) 83.65 93.46 73.83 71.55 79.37 63.73
LogicNet (preds) 85.72 94.12 77.32 72.83 79.38 66.28
LogicNet (BoL) 85.48 94.05 76.91 73.03 79.96 66.11
LogicNet (preds + BoL) 85.66 94.23 77.10 74.50 81.41 67.59
LogicNet (preds) 81.46 94.42 68.49 77.95 91.07 64.82
LogicNet (BoL) 80.65 94.18 67.12 78.15 91.72 64.58
LogicNet (preds + BoL) 82.18 93.74 70.63 79.08 90.89 67.28

For CelebA-logic, when considering logical consistency of predictions, the patterns echo the previous observations. For both AFFACT and ALM, we use the original model weights provided by the authors. The top half of Table 2 shows that either using the original annotations or the cleaned annotations, there is a 2.49% accuracy decreasing after considering the logical consistency. The average accuracy decrease of the models tested on the cleaned annotations is 4.04%, where the BF has the smallest accuracy difference and BCE-MOON has the largest accuracy difference. Our speculation is that BF over-focuses on negative attributes but the logical relationship mostly happens between positive side, so BF has lower probability to disobey logical relationships. Conversely, BCE-MOON over-focuses on positive side, so it has higher probability to disobey logical relationships. Results in Table 2 and Table 3 show that the proposed method has the best performances on average accuracy of all attributes and strong relationship attributes, where it is {1.01%, 1.71%} higher than the second-highest accuracy. Therefore, the proposed method has the best ability to learn the logical relationships.

5.3 Logical Consistency↩︎

To evaluate the performance of logical consistency of prediction in the real-world case, we use the subset of WebFace260M, which contains 603,910 images, as a test set. Since there are no ground truth labels, we only measure the ratio of failed (logically inconsistent) predictions for each method.

Table 4 shows that without the post-processing step, the average failed rate is in the range of {51%, 64.05%} for four commonly used methods. BF trained with FH41K predicts too many negative labels which causes the outlier ratio, 97%. The proposed method significantly reduces the number of failed cases, where the failed ratio, {25.47%, 24.36%}, is less than half of the average failed ratio. When we implement the post-processing strategy, all the incomplete cases are gone, which results in a low failed ratio for all methods other than BCE-MOON. This supports the aforementioned speculation, where BCE-MOON over-focuses on positive side and existing methods can somewhat learn the pattern but need to involve post-processing steps. The logical consistency test on the classifiers trained with CelebA-logic is in Supplementary Material.

5.4 Ablation Study↩︎

To show the effectiveness of our method in adversarial training, we conducted an experiment to compare the performance of three ways of training a discriminator: 1) Directly feed the predictions to the discriminator as negative samples, 2) Directly feed the poisoned labels to the discriminator, 3) Randomly feed the predictions and poisoned labels to the discriminator. The proposed method can achieve {5.61%, 1.47%, 0.93%} accuracy increase on three datasets with the logical consistency considered.

6 Conclusions and Limitations↩︎

We point out that the problem of logical consistency of attribute predictions in computer vision has received no attention to date. To fill this void, we provide two new datasets for two logical consistency challenges: 1) Train a classifier with logical-consistency-checked data to leard a classifier that makes logically consistent predictions, and 2) Train a classifier with training data that contains logically inconsistent labels and still achieve logically consistent predictions. To our best knowledge, this is the first work that comprehensively discusses the problem of logical consistency of predictions in multi-attribute classification.

We propose the LogicNet, which does not involve any post-processing step, and significantly increases the performance, {23.05% (FH37K), 9.96% (FH41K), 1.71% (CelebA-logic)} higher than the second best, under logical consistency checked condition for all three datasets. For the real-world case analysis, the proposed method can largely reduce the failed ratio of the predictions.

The proposed method provides a general solution to cause model predictions to be more logically consistent than the previous methods, but the accuracy difference before and after consider the logical consistency on predictions is still large and the failed ratio is not negligible for both challenges. Further research is needed to improve logical consistency in attribute predictions.

References↩︎

[1]
Z. Liu, P. Luo, X. Wang, and X. Tang, “Deep learning face attributes in the wild , booktitle = ICCV,” 2015, pp. 3730–3738, doi: 10.1109/ICCV.2015.425 , timestamp = {Thu, 23 Mar 2023 23:57:43 +0100}, biburl = {https://dblp.org/rec/conf/iccv/LiuLWT15.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[2]
N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, “Attribute and simile classifiers for face verification , booktitle = ICCV,” 2009, pp. 365–372, doi: 10.1109/ICCV.2009.5459250 , timestamp = {Thu, 23 Mar 2023 23:57:42 +0100}, biburl = {https://dblp.org/rec/conf/iccv/KumarBBN09.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[3]
H. Wu, G. Bezold, A. Bhatta, and K. W. Bowyer, “Logical consistency and greater descriptive power for facial hair attribute learning , booktitle = CVPR,” 2023, pp. 8588–8597, doi: 10.1109/CVPR52729.2023.00830 , timestamp = {Mon, 28 Aug 2023 16:14:40 +0200}, biburl = {https://dblp.org/rec/conf/cvpr/WuBBB23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[4]
Z. Liu, P. Luo, S. Qiu, X. Wang, and X. Tang, “DeepFashion: Powering robust clothes recognition and retrieval with rich annotations , booktitle = CVPR,” 2016, pp. 1096–1104, doi: 10.1109/CVPR.2016.124 , timestamp = {Fri, 24 Mar 2023 00:02:52 +0100}, biburl = {https://dblp.org/rec/conf/cvpr/LiuLQWT16.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[5]
H. Chen, A. C. Gallagher, and editor =. A. W. F. and S. L. and P. P. and Y. S. and C. S. Bernd Girod, “Describing clothing by semantic attributes , booktitle = ECCV,” 2012, vol. 7574, pp. 609–623, doi: 10.1007/978-3-642-33712-3\_44 , timestamp = {Thu, 27 Jul 2023 08:17:29 +0200}, biburl = {https://dblp.org/rec/conf/eccv/ChenGG12.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[6]
C. H. Lampert, H. Nickisch, and S. Harmeling, “Learning to detect unseen object classes by between-class attribute transfer , booktitle = CVPR,” 2009, pp. 951–958, doi: 10.1109/CVPR.2009.5206594 , timestamp = {Fri, 24 Mar 2023 00:02:51 +0100}, biburl = {https://dblp.org/rec/conf/cvpr/LampertNH09.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[7]
J. Liu, B. Kuipers, and S. Savarese, “Recognizing human actions by attributes , booktitle = CVPR,” 2011, pp. 3337–3344, doi: 10.1109/CVPR.2011.5995353 , timestamp = {Fri, 24 Mar 2023 00:03:00 +0100}, biburl = {https://dblp.org/rec/conf/cvpr/LiuKS11.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[8]
[9]
T. Berg and P. N. Belhumeur, “POOF: Part-based one-vs.-one features for fine-grained categorization, face verification, and attribute estimation , booktitle = CVPR,” 2013, pp. 955–962, doi: 10.1109/CVPR.2013.128 , timestamp = {Fri, 24 Mar 2023 00:02:52 +0100}, biburl = {https://dblp.org/rec/conf/cvpr/BergB13.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[10]
J.-S. Chan, G.-S. J. Hsu, H.-C. Shie, and Y.-X. Chen, “Face recognition by facial attribute assisted network , booktitle = ICIP,” 2017, pp. 3825–3829, doi: 10.1109/ICIP.2017.8296998 , timestamp = {Wed, 16 Oct 2019 14:14:52 +0200}, biburl = {https://dblp.org/rec/conf/icip/ChanHSC17.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[11]
N. Kumar, A. C. Berg, P. N. Belhumeur, and S. K. Nayar, “Describable visual attributes for face verification and image search,” PAMI, vol. 33, no. 10, pp. 1962–1977, 2011, doi: 10.1109/TPAMI.2011.48 , timestamp = {Wed, 14 Nov 2018 10:51:14 +0100}, biburl = {https://dblp.org/rec/journals/pami/KumarBBN11.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[12]
F. Song, X. Tan, and S. Chen, “Exploiting relationship between attributes for improved face verification,” Comput. Vis. Image Underst., vol. 122, pp. 143–154, 2014, doi: 10.1016/j.cviu.2014.02.010 , timestamp = {Fri, 21 Feb 2020 21:17:31 +0100}, biburl = {https://dblp.org/rec/journals/cviu/SongTC14.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[13]
Z. Shi, T. M. Hospedales, and T. Xiang, “Transferring a semantic representation for person re-identification and search,” ArXiv preprint, vol. arXiv:1706.03725, 2017, [Online]. Available: http://arxiv.org/abs/1706.03725 , eprinttype = {arXiv}, eprint = {1706.03725}, timestamp = {Thu, 11 Jul 2019 09:13:39 +0200}, biburl = {https://dblp.org/rec/journals/corr/ShiHX17a.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[14]
C. Su, F. Yang, S. Zhang, Q. Tian, L. S. Davis, and W. Gao, “Multi-task learning with low rank attribute embedding for multi-camera person re-identification,” PAMI, vol. 40, no. 5, pp. 1167–1181, 2018, doi: 10.1109/TPAMI.2017.2679002 , timestamp = {Wed, 14 Nov 2018 10:51:11 +0100}, biburl = {https://dblp.org/rec/journals/pami/SuYZTDG18.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[15]
C. Su, S. Zhang, J. Xing, W. Gao, and editor =. B. L. and J. M. and N. S. and M. W. Qi Tian, “Deep attributes driven multi-camera person re-identification , booktitle = ECCV,” 2016 , series = {Lecture Notes in Computer Science}, vol. 9906, pp. 475–491, doi: 10.1007/978-3-319-46475-6\_30 , timestamp = {Wed, 07 Dec 2022 23:10:23 +0100}, biburl = {https://dblp.org/rec/conf/eccv/SuZX0T16.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[16]
Y. Choi, M.-J. Choi, M. Kim, J.-W. Ha, S. Kim, and J. Choo, “StarGAN: Unified generative adversarial networks for multi-domain image-to-image translation , booktitle = CVPR,” 2018, pp. 8789–8797, doi: 10.1109/CVPR.2018.00916 , timestamp = {Wed, 21 Jun 2023 15:57:11 +0200}, biburl = {https://dblp.org/rec/conf/cvpr/ChoiCKH0C18.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[17]
Y. Choi, Y. Uh, J. Yoo, and J.-W. Ha, “StarGAN v2: Diverse image synthesis for multiple domains , booktitle = CVPR,” 2020, pp. 8185–8194, doi: 10.1109/CVPR42600.2020.00821 , timestamp = {Wed, 21 Jun 2023 15:57:11 +0200}, biburl = {https://dblp.org/rec/conf/cvpr/ChoiUYH20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[18]
Z. He, W. Zuo, M. Kan, S. Shan, and X. Chen, “AttGAN: Facial attribute editing by only changing what you want,” TIP, vol. 28, no. 11, pp. 5464–5478, 2019, doi: 10.1109/TIP.2019.2916751 , timestamp = {Thu, 02 Dec 2021 17:27:17 +0100}, biburl = {https://dblp.org/rec/journals/tip/HeZKSC19.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[19]
D. Li, J. Yang, K. Kreis, A. Torralba, and S. Fidler, “Semantic segmentation with generative models: Semi-supervised learning and strong out-of-domain generalization , booktitle = CVPR,” 2021, pp. 8300–8311, doi: 10.1109/CVPR46437.2021.00820 , timestamp = {Mon, 18 Jul 2022 16:47:41 +0200}, biburl = {https://dblp.org/rec/conf/cvpr/LiYK0F21.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[20]
[21]
P. Terhörst et al., “A comprehensive study on face recognition biases beyond demographics,” IEEE Transactions on Technology and Society, vol. abs/2103.01592, 2021, [Online]. Available: https://arxiv.org/abs/2103.01592 , eprinttype = {arXiv}, eprint = {2103.01592}, timestamp = {Thu, 14 Oct 2021 09:17:53 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2103-01592.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[22]
A. Bhatta, V. Albiero, K. W. Bowyer, and M. C. King, “The gender gap in face recognition accuracy is a hairy problem , booktitle = WACVW,” 2023, pp. 1–10, doi: 10.1109/WACVW58289.2023.00034 , timestamp = {Mon, 13 Feb 2023 21:53:11 +0100}, biburl = {https://dblp.org/rec/conf/wacv/BhattaABK23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[23]
[24]
H. Wu, G. Bezold, M. Günther, T. E. Boult, M. C. King, and K. W. Bowyer, “Consistency and accuracy of CelebA attribute values , booktitle = CVPRW,” 2023, pp. 3258–3266, doi: 10.1109/CVPRW59228.2023.00328 , timestamp = {Wed, 23 Aug 2023 16:23:26 +0200}, biburl = {https://dblp.org/rec/conf/cvpr/WuBGBKB23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[25]
B. Lingenfelter, S. R. Davis, and editor =. G. B. and B. L. and A. Y. and Y. L. and Y. D. and M. L. and R. K. and A. C. and R. C. Emily M. Hand, “A quantitative analysis of labeling issues in the CelebA dataset , booktitle = ISVC,” 2022 , series = {Lecture Notes in Computer Science}, vol. 13598, pp. 129–141, doi: 10.1007/978-3-031-20713-6\_10 , timestamp = {Fri, 23 Dec 2022 17:47:35 +0100}, biburl = {https://dblp.org/rec/conf/isvc/LingenfelterDH22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[26]
O. A. Arigbabu, S. M. S. Ahmad, W. A. W. Adnan, and S. Yussof, “Recent advances in facial soft biometrics,” Vis. Comput., vol. 31, no. 5, pp. 513–525, 2015, doi: 10.1007/s00371-014-0990-x , timestamp = {Thu, 04 Jun 2020 19:38:24 +0200}, biburl = {https://dblp.org/rec/journals/vc/ArigbabuAAY15.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[27]
F. Becerra-Riera, A. Morales-González, and H. Méndez-Vázquez, “A survey on facial soft biometrics for video surveillance and forensic applications,” Artif. Intell. Rev., vol. 52, no. 2, pp. 1155–1187, 2019, doi: 10.1007/s10462-019-09689-5 , timestamp = {Wed, 25 Sep 2019 17:51:57 +0200}, biburl = {https://dblp.org/rec/journals/air/Becerra-RieraMM19.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[28]
N. Thom and E. M. Hand, “Facial attribute recognition: A survey,” Computer Vision: A Reference Guide, pp. 1–13, 2020.
[29]
X. Zheng, Y. Guo, H. Huang, Y. Li, and R. He, “A survey of deep facial attribute analysis,” IJCV, vol. 128, no. 8, pp. 2002–2034, 2020, doi: 10.1007/s11263-020-01308-z , timestamp = {Fri, 14 Aug 2020 16:38:08 +0200}, biburl = {https://dblp.org/rec/journals/ijcv/ZhengGHLH20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[30]
[31]
P. Clark, O. Tafjord, and editor =. C. B. Kyle Richardson, “Transformers as soft reasoners over language , booktitle = IJCAI,” 2020, pp. 3882–3890, doi: 10.24963/ijcai.2020/537 , timestamp = {Fri, 12 Mar 2021 08:37:09 +0100}, biburl = {https://dblp.org/rec/conf/ijcai/ClarkTR20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[32]
L. Pan, W. Chen, W. Xiong, M.-Y. Kan, and editor =. K. T. and A. R. and L. Z. and D. H.-T. and I. B. and S. B. and R. C. and T. C. and Y. Z. William Yang Wang, “Unsupervised multi-hop question answering by question generation , booktitle = NAACL-HLT,” 2021, pp. 5866–5880, doi: 10.18653/v1/2021.naacl-main.469 , timestamp = {Fri, 06 Aug 2021 00:41:32 +0200}, biburl = {https://dblp.org/rec/conf/naacl/PanCXKW21.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[33]
J. Weston, A. Bordes, S. Chopra, and editor =. Y. B. and Y. L. Tomás Mikolov, “Towards AI-complete question answering: A set of prerequisite toy tasks , booktitle = ICLR,” 2016, [Online]. Available: http://arxiv.org/abs/1502.05698 , timestamp = {Mon, 28 Dec 2020 11:31:02 +0100}, biburl = {https://dblp.org/rec/journals/corr/WestonBCM15.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[34]
Q. Bao et al., “Multi-step deductive reasoning over natural language: An empirical study on out-of-distribution generalisation , booktitle = IJCLR,” 2022 , series = {{CEUR} Workshop Proceedings}, vol. 3212, pp. 202–217, [Online]. Available: https://ceur-ws.org/Vol-3212/paper15.pdf , timestamp = {Fri, 10 Mar 2023 16:23:33 +0100}, biburl = {https://dblp.org/rec/conf/nesy/BaoPHTDWL22.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[35]
[36]
H. Ding, H. Zhou, S. K. Zhou, and editor =. S. A. M. and K. Q. W. Rama Chellappa, “A deep cascade network for unaligned face attribute classification , booktitle = AAAI,” 2018, pp. 6789–6796, doi: 10.1609/aaai.v32i1.12303 , timestamp = {Mon, 04 Sep 2023 16:50:24 +0200}, biburl = {https://dblp.org/rec/conf/aaai/DingZZC18.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[37]
J. Cao, Y. Li, and Z. Zhang, “Partially shared multi-task convolutional neural network with local constraint for face attribute learning , booktitle = CVPR,” 2018, pp. 4290–4299, doi: 10.1109/CVPR.2018.00451 , timestamp = {Tue, 31 Aug 2021 14:00:32 +0200}, biburl = {https://dblp.org/rec/conf/cvpr/CaoLZ18.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[38]
H. Han, A. K. Jain, F. Wang, S. Shan, and X. Chen, “Heterogeneous face attribute estimation: A deep multi-task learning approach,” PAMI, vol. 40, no. 11, pp. 2597–2609, 2018, doi: 10.1109/TPAMI.2017.2738004 , timestamp = {Thu, 02 Dec 2021 17:27:17 +0100}, biburl = {https://dblp.org/rec/journals/pami/HanJWSC18.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[39]
F. Taherkhani, A. Dabouei, S. Soleymani, J. M. Dawson, and N. M. Nasrabadi, “Tasks structure regularization in multi-task learning for improving facial attribute prediction,” ArXiv preprint, vol. arXiv:2108.04353, 2021, [Online]. Available: https://arxiv.org/abs/2108.04353 , eprinttype = {arXiv}, eprint = {2108.04353}, timestamp = {Mon, 16 Aug 2021 09:08:10 +0200}, biburl = {https://dblp.org/rec/journals/corr/abs-2108-04353.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[40]
S. Wang, G. Peng, and Z. Zheng, “Capturing joint label distribution for multi-label classification through adversarial learning,” IEEE Trans. Knowl. Data Eng., vol. 32, no. 12, pp. 2310–2321, 2020, doi: 10.1109/TKDE.2019.2922603 , timestamp = {Thu, 31 Dec 2020 01:34:37 +0100}, biburl = {https://dblp.org/rec/journals/tkde/WangPZ20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[41]
[42]
Z. Zhu et al., “WebFace260M: A benchmark unveiling the power of million-scale deep face recognition , booktitle = CVPR,” 2021, pp. 10492–10502, doi: 10.1109/CVPR46437.2021.01035 , timestamp = {Sun, 02 Oct 2022 15:58:42 +0200}, biburl = {https://dblp.org/rec/conf/cvpr/ZhuHDY0CZYLD021.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[43]
C. Huang, Y. Li, C. C. Loy, and X. Tang, “Deep imbalanced learning for face recognition and attribute prediction,” PAMI, vol. 42, no. 11, pp. 2781–2794, 2020, doi: 10.1109/TPAMI.2019.2914680 , timestamp = {Mon, 07 Nov 2022 15:33:44 +0100}, biburl = {https://dblp.org/rec/journals/pami/HuangLLT20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[44]
T.-Y. Lin, P. Goyal, R. B. Girshick, K. He, and P. Dollár, “Focal loss for dense object detection,” PAMI, vol. 42, no. 2, pp. 318–327, 2020, doi: 10.1109/TPAMI.2018.2858826 , timestamp = {Sat, 30 May 2020 20:02:13 +0200}, biburl = {https://dblp.org/rec/journals/pami/LinGGHD20.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[45]
E. M. Rudd, M. Günther, and editor =. B. L. and J. M. and N. S. and M. W. Terrance E. Boult, “MOON: A mixed objective optimization network for the recognition of facial attributes , booktitle = ECCV,” 2016 , series = {Lecture Notes in Computer Science}, vol. 9909, pp. 19–35, doi: 10.1007/978-3-319-46454-1\_2 , timestamp = {Wed, 07 Dec 2022 23:10:23 +0100}, biburl = {https://dblp.org/rec/conf/eccv/RuddGB16.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[46]
K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition , booktitle = CVPR,” 2016, pp. 770–778, doi: 10.1109/CVPR.2016.90 , timestamp = {Fri, 24 Mar 2023 00:02:57 +0100}, biburl = {https://dblp.org/rec/conf/cvpr/HeZRS16.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[47]
M. Günther, A. Rozsa, and T. E. Boult, “AFFACT: Alignment-free facial attribute classification technique , booktitle = IJCB,” 2017, pp. 90–99, doi: 10.1109/BTAS.2017.8272686 , timestamp = {Sat, 05 Sep 2020 18:02:57 +0200}, biburl = {https://dblp.org/rec/conf/icb/GuntherRB17.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.
[48]
“Face attribute recognition via end-to-end weakly supervised regional location,” Multim. Syst., vol. 29, no. 4, pp. 2137–2152, 2023, doi: 10.1007/S00530-023-01095-W , timestamp = {Wed, 26 Jul 2023 17:41:16 +0200}, biburl = {https://dblp.org/rec/journals/mms/ShiSZWL23.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}.

  1. https://github.com/HaiyuWu/LogicalConsistency#testing↩︎

  2. https://pytorch.org/hub/pytorch_vision_resnet/↩︎