# Everything old is new again: A multi-view learning approach to learning using privileged information and distillation

@article{Wang2019EverythingOI, title={Everything old is new again: A multi-view learning approach to learning using privileged information and distillation}, author={Weiran Wang}, journal={ArXiv}, year={2019}, volume={abs/1903.03694} }

We adopt a multi-view approach for analyzing two knowledge transfer settings---learning using privileged information (LUPI) and distillation---in a common framework. Under reasonable assumptions about the complexities of hypothesis spaces, and being optimistic about the expected loss achievable by the student (in distillation) and a transformed teacher predictor (in LUPI), we show that encouraging agreement between the teacher and the student leads to reduced search space. As a result, improved… Expand

#### Topics from this paper

#### 2 Citations

Leveraging Just a Few Keywords for Fine-Grained Aspect Detection Through Weakly Supervised Co-Training

- Computer Science, Mathematics
- EMNLP/IJCNLP
- 2019

This work considers weakly supervised approaches for training aspect classifiers that only require the user to provide a small set of seed words for aspect detection, and proposes a student-teacher approach that effectively leverages seed words in a bag-of-words classifier (teacher); in turn, uses the teacher to train a second model that is potentially more powerful (e.g., a neural network that uses pre-trained word embeddings). Expand

Large Scale Long-tailed Product Recognition System at Alibaba

- Computer Science
- CIKM
- 2020

A novel side information based large scale visual recognition co-training (SICoT) system to deal with the long tail problem by leveraging the image related side information and a semantic embedding from the noisy side information is presented. Expand

#### References

SHOWING 1-10 OF 22 REFERENCES

Unifying distillation and privileged information

- Computer Science, Mathematics
- ICLR
- 2016

The theoretical and causal insight about the inner workings of generalized distillation is provided, it is extended to unsupervised, semisupervised and multitask learning scenarios, and its efficacy on a variety of numerical simulations on both synthetic and real-world data is illustrated. Expand

On the theory of learning with Privileged Information

- Mathematics
- NIPS 2010
- 2010

In Learning Using Privileged Information (LUPI) paradigm, along with the standard training data in the decision space, a teacher supplies a learner with the privileged information in the correcting… Expand

A Co-Regularization Approach to Semi-supervised Learning with Multiple Views

- Mathematics
- 2005

The Co-Training algorithm uses unlabeled examples in multiple views to bootstrap classifiers in each view, typically in a greedy manner, and operating under assumptions of view-independence and… Expand

Learning using privileged information: similarity control and knowledge transfer

- Computer Science
- J. Mach. Learn. Res.
- 2015

Two mechanisms that can be used for significantly accelerating the speed of student's learning using privileged information are described: correction of Student's concepts of similarity between examples, and direct Teacher-Student knowledge transfer. Expand

A new learning paradigm: Learning using privileged information

- Computer Science, Medicine
- Neural Networks
- 2009

Details of the new paradigm and corresponding algorithms are discussed, some new algorithms are introduced, several specific forms of privileged information are considered, and superiority of thenew learning paradigm over the classical learning paradigm when solving practical problems is demonstrated. Expand

The Rademacher Complexity of Co-Regularized Kernel Classes

- Computer Science
- AISTATS
- 2007

The co-regularization method used in the CoRLS algorithm, in which the views are reproducing kernel Hilbert spaces (RKHS's), is examined, which reduces the Rademacher complexity by an amount that depends on the distance between the two views, as measured by a data dependent metric. Expand

Distilling the Knowledge in a Neural Network

- Mathematics, Computer Science
- ArXiv
- 2015

This work shows that it can significantly improve the acoustic model of a heavily used commercial system by distilling the knowledge in an ensemble of models into a single model and introduces a new type of ensemble composed of one or more full models and many specialist models which learn to distinguish fine-grained classes that the full models confuse. Expand

Combining labeled and unlabeled data with co-training

- Computer Science
- COLT' 98
- 1998

A PAC-style analysis is provided for a problem setting motivated by the task of learning to classify web pages, in which the description of each example can be partitioned into two distinct views, to allow inexpensive unlabeled data to augment, a much smaller set of labeled examples. Expand

Multi-view Regression Via Canonical Correlation Analysis

- Mathematics, Computer Science
- COLT
- 2007

This work provides a semi-supervised algorithm which first uses unlabeled data to learn a norm (or, equivalently, a kernel) and then uses labeled data in a ridge regression algorithm (with this induced norm) to provide the predictor. Expand

Efficient Co-Training of Linear Separators under Weak Dependence

- Computer Science
- COLT
- 2017

We develop the first polynomial-time algorithm for co-training of homogeneous linear separators under weak dependence, a relaxation of the condition of independence given the label. Our algorithm… Expand