H. Akaike, Information theory and an extension of the maximum likelihood principle, pp.267-281, 1973.

P. Alquier, Transductive and inductive adaptative inference for regression and density estimation, 2006.
URL : https://hal.archives-ouvertes.fr/tel-00119593

P. Alquier, PAC-Bayesian bounds for randomized empirical risk minimizers, Mathematical Methods of Statistics, vol.17, issue.4, pp.279-304, 2008.
DOI : 10.3103/S1066530708040017

URL : https://hal.archives-ouvertes.fr/hal-00354922

A. Ambroladze, E. Parrado-hernández, and J. Shawe-taylor, Tighter PACbayes bounds, Advances in Neural Information Processing Systems 18, pp.9-16, 2006.

J. Audibert, PAC-Bayesian statistical learning theory, 2004.

J. Audibert, Progressive mixture rules are deviation suboptimal, Advances in Neural Information Processing Systems, 2007.

J. Audibert, Fast learning rates in statistical inference through aggregation, The Annals of Statistics, vol.37, issue.4, pp.1591-1646, 2009.
DOI : 10.1214/08-AOS623

URL : https://hal.archives-ouvertes.fr/hal-00139030

J. Audibert and O. Bousquet, Combining PAC-bayesian and generic chaining bounds, Audibert and S. Bubeck. Minimax policies for adversarial and stochastic bandits, pp.863-889, 2007.

J. Audibert and S. Bubeck, Regret bounds and minimax policies under partial monitoring, Journal of Machine Learning Research, 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654356

J. Audibert, S. Bubeck, and R. Munos, Best Arm Identification in Multi- Armed Bandits, Proceedings of the 23th annual conference on Computational Learning Theory (COLT), 2010.
URL : https://hal.archives-ouvertes.fr/hal-00654404

J. Audibert and O. Catoni, Risk bounds in linear regression through PAC-bayesian truncation, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00360268

J. Audibert, R. Munos, and C. Szepesvri, Exploration???exploitation tradeoff using variance estimates in multi-armed bandits, Theoretical Computer Science, vol.410, issue.19, pp.1876-1902, 2009.
DOI : 10.1016/j.tcs.2009.01.016

URL : https://hal.archives-ouvertes.fr/hal-00711069

J. Audibert and A. B. Tsybakov, Fast learning rates for plug-in classifiers, The Annals of Statistics, vol.35, issue.2, pp.608-633, 2007.
DOI : 10.1214/009053606000001217

URL : https://hal.archives-ouvertes.fr/hal-00160849

P. Auer, Using confidence bounds for exploitation-exploration trade-offs, Journal of Machine Learning Research, vol.3, pp.397-422, 2002.

P. Auer, N. Cesa-bianchi, and P. Fischer, Finite-time analysis of the multiarmed bandit problem, Machine Learning, vol.47, issue.2/3, pp.235-256, 2002.
DOI : 10.1023/A:1013689704352

P. Auer, N. Cesa-bianchi, Y. Freund, and R. Schapire, The Nonstochastic Multiarmed Bandit Problem, SIAM Journal on Computing, vol.32, issue.1, pp.48-77, 2002.
DOI : 10.1137/S0097539701398375

P. Auer, R. Ortner, and C. Szepesvári, Improved Rates for the Stochastic Continuum-Armed Bandit Problem, 20th COLT, 2007.
DOI : 10.1007/978-3-540-72927-3_33

M. Babaioff, Y. Sharma, and A. Slivkins, Characterizing truthful multiarmed bandit mechanisms: extended abstract, Proceedings of the tenth ACM conference on Electronic commerce, pp.79-88, 2009.

F. R. Bach, Consistency of the group Lasso and multiple kernel learning, Journal of Machine Learning Research, vol.9, pp.1179-1225, 2008.
URL : https://hal.archives-ouvertes.fr/hal-00164735

F. R. Bach, G. R. Lanckriet, and M. I. Jordan, Multiple kernel learning, conic duality, and the SMO algorithm, Twenty-first international conference on Machine learning , ICML '04, 2004.
DOI : 10.1145/1015330.1015424

A. Barron, Are Bayes Rules Consistent in Information?, Open Problems in Communication and Computation, pp.85-91, 1987.
DOI : 10.1007/978-1-4612-4808-8_22

A. Barron and Y. Yang, convergence, The Annals of Statistics, vol.27, issue.5, pp.1564-1599, 1999.
DOI : 10.1214/aos/1017939142

P. L. Bartlett, O. Bousquet, and S. Mendelson, Local Rademacher complexities, The Annals of Statistics, vol.33, issue.4, pp.1497-1537, 2005.
DOI : 10.1214/009053605000000282

P. L. Bartlett, M. I. Jordan, and J. D. Mcauliffe, Convexity, Classification, and Risk Bounds, Journal of the American Statistical Association, vol.101, issue.473, pp.138-156, 2006.
DOI : 10.1198/016214505000000907

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.15.3497

P. L. Bartlett and M. Traskin, Adaboost is consistent, J. Mach. Learn. Res, vol.8, pp.2347-2368, 2007.

D. Bergemann and J. Valimaki, Bandit Problems, The New Palgrave Dictionary of Economics, 2008.
DOI : 10.1057/978-1-349-95121-5_2386-1

D. A. Berry, R. W. Chen, A. Zame, D. C. Heath, and L. A. Shepp, Bandit problems with infinitely many arms, The Annals of Statistics, vol.25, issue.5, pp.2103-2116, 1997.
DOI : 10.1214/aos/1069362389

L. Birgé and P. Massart, Minimum Contrast Estimators on Sieves: Exponential Bounds and Rates of Convergence, Bernoulli, vol.4, issue.3, pp.329-375, 1998.
DOI : 10.2307/3318720

L. Birgé and P. Massart, Minimal penalties for Gaussian model selection. Probability Theory and Related Fields, pp.33-73, 2007.

G. Blanchard, Méthodes de mélange et d'agrégation d'estimateurs en reconnaissance de formes. Application aux arbres de décision, 2001.

S. Boucheron, O. Bousquet, and G. Lugosi, Theory of Classification: a Survey of Some Recent Advances, ESAIM: Probability and Statistics, vol.9, pp.323-375, 2005.
DOI : 10.1051/ps:2005018

URL : https://hal.archives-ouvertes.fr/hal-00017923

S. Bubeck, R. Munos, G. Stoltz, and C. Szepesvari, Online optimization in X-armed bandits, Advances in Neural Information Processing Systems 21, pp.201-208, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00329797

F. Bunea, A. B. Tsybakov, and M. H. Wegkamp, Aggregation for Gaussian regression, 2007. [37] O. Catoni. Statistical learning theory and stochastic optimization, 2001.
DOI : 10.1214/009053606000001587

]. O. Catoni, A mixture approach to universal model selection. preprint LMENS 97-30, Available from http, 1997.

]. O. Catoni, Universal aggregation rules with exact bias bound

O. Catoni, A PAC-Bayesian approach to adaptive classification, 2003.

O. Catoni, PAC-Bayesian supervised classification: the thermodynamics of statistical learning. Lecture Notes series of the IMS, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00206119

O. Catoni, High confidence estimates of the mean of heavy-tailed real random variables, 2009.
URL : https://hal.archives-ouvertes.fr/hal-00423460

N. Cesa-bianchi, Analysis of two gradient-based algorithms for on-line regression, Proceedings of the tenth annual conference on Computational learning theory , COLT '97, pp.392-411, 1999.
DOI : 10.1145/267460.267492

N. Cesa-bianchi, Y. Freund, D. Haussler, D. P. Helmbold, R. E. Schapire et al., How to use expert advice, Journal of the ACM, vol.44, issue.3, pp.427-485, 1997.
DOI : 10.1145/258128.258179

N. Cesa-bianchi and G. Lugosi, On Prediction of Individual Sequences, SSRN Electronic Journal, vol.27, issue.6, pp.1865-1895, 1999.
DOI : 10.2139/ssrn.139692

N. Cesa-bianchi, G. Lugosi, and G. Stoltz, Minimizing Regret With Label Efficient Prediction, IEEE Transactions on Information Theory, vol.51, issue.6, pp.2152-2162, 2005.
DOI : 10.1109/TIT.2005.847729

URL : https://hal.archives-ouvertes.fr/hal-00007537

P. A. Coquelin and R. Munos, Bandit algorithms for tree search, In Uncertainty in Artificial Intelligence, 2007.
URL : https://hal.archives-ouvertes.fr/inria-00150207

P. Dagum, R. Karp, M. Luby, and S. Ross, An Optimal Algorithm for Monte Carlo Estimation, SIAM Journal on Computing, vol.29, issue.5, pp.1484-1496, 2000.
DOI : 10.1137/S0097539797315306

A. Dalalyan and A. Tsybakov, Aggregation by exponential weighting, sharp oracle inequalities and sparsity, Machine Learning, pp.39-61, 2008.

A. Dalalyan and A. Tsybakov, Sparse regression learning by aggregation and Langevin Monte-Carlo, 22nd Annual Conference on Learning Theory, 2009.
DOI : 10.1016/j.jcss.2011.12.023

URL : https://hal.archives-ouvertes.fr/hal-00362471

N. R. Devanur and S. M. Kakade, The price of truthfulness for pay-perclick auctions, Proceedings of the tenth ACM conference on Electronic commerce, pp.99-106, 2009.

L. Devroye, L. Györfi, and G. Lugosi, A Probabilistic Theory of Pattern Recognition, 1996.
DOI : 10.1007/978-1-4612-0711-5

C. Domingo, R. Gavaldà, and O. Watanabe, Adaptive sampling methods for scaling up knowledge discovery algorithms, Data Mining and Knowledge Discovery, vol.6, issue.2, pp.131-152, 2002.
DOI : 10.1023/A:1014091514039

R. M. Dudley, Central Limit Theorems for Empirical Measures, The Annals of Probability, vol.6, issue.6, pp.899-929, 1978.
DOI : 10.1214/aop/1176995384

E. Even-dar, S. Mannor, and Y. Mansour, Action elimination and stopping conditions for the multi-armed bandit and reinforcement learning problems, The Journal of Machine Learning Research, vol.7, pp.1079-1105, 2006.

D. A. Freedman, On tail probabilities for martingales. The Annals of Probability, pp.100-118, 1975.

Y. Freund and R. E. Schapire, A Decision-Theoretic Generalization of On-Line Learning and an Application to Boosting, Journal of Computer and System Sciences, vol.55, issue.1, pp.119-139, 1997.
DOI : 10.1006/jcss.1997.1504

S. Gelly and Y. Wang, Exploration exploitation in go: UCT for Monte- Carlo go, Online trading between exploration and exploitation Workshop Twentieth Annual Conference on Neural Information Processing Systems, 2006.
URL : https://hal.archives-ouvertes.fr/hal-00115330

P. Germain, A. Lacasse, F. Laviolette, and M. Marchand, PAC-Bayesian learning of linear classifiers, Proceedings of the 26th Annual International Conference on Machine Learning, ICML '09, pp.353-360, 2009.
DOI : 10.1145/1553374.1553419

J. C. Gittins, S. Grünewälder, J. Audibert, M. Opper, and J. Shawe-taylor, Multi-armed Bandit Allocation Indices Wiley-Interscience series in systems and optimization Regret bounds for gaussian process bandit problems, Proceedings of the 14th International Conference on Artificial Intelligence and Statistics (AISTATS), 1989.

L. Györfi, M. Kohler, A. Krzy?, and H. Walk, A Distribution-Free Theory of Nonparametric Regression, 2004.
DOI : 10.1007/b97848

A. György and G. Ottucsák, Adaptive Routing Using Expert Advice, The Computer Journal, vol.49, issue.2, pp.180-189, 2006.
DOI : 10.1093/comjnl/bxh168

D. Haussler, J. Kivinen, and M. K. Warmuth, Sequential prediction of individual sequences under general loss functions, IEEE Transactions on Information Theory, vol.44, issue.5, pp.1906-1925, 1998.
DOI : 10.1109/18.705569

D. Helmbold and S. Panizza, Some label efficient learning results, Proceedings of the tenth annual conference on Computational learning theory , COLT '97, pp.218-230, 1997.
DOI : 10.1145/267460.267502

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.179.437

W. Hoeffding, Probability Inequalities for Sums of Bounded Random Variables, Journal of the American Statistical Association, vol.1, issue.301, pp.13-30, 1963.
DOI : 10.1214/aoms/1177730491

J. H. Holland, Adaptation in natural and artificial systems, 1992.

J. Huang and T. Zhang, The benefit of group sparsity, The Annals of Statistics, vol.38, issue.4, 2009.
DOI : 10.1214/09-AOS778

R. Jenatton, J. Audibert, and F. Bach, Structured variable selection with sparsity-inducing norms, 2009.
URL : https://hal.archives-ouvertes.fr/inria-00377732

A. Juditsky and A. Nemirovski, Functional aggregation for nonparametric estimation, Ann. Stat, vol.28, pp.681-712, 2000.

A. Juditsky, P. Rigollet, and A. B. Tsybakov, Learning by mirror averaging, The Annals of Statistics, vol.36, issue.5, pp.2183-2206, 2008.
DOI : 10.1214/07-AOS546

URL : https://hal.archives-ouvertes.fr/hal-00341026

A. B. Juditsky, A. V. Nazin, A. B. Tsybakov, and N. Vayatis, Recursive aggregation of estimators by the mirror descent algorithm with averaging. Problems of Information Transmission, pp.368-384, 2005.

J. Kivinen and M. K. Warmuth, Exponentiated gradient versus gradient descent for linear predictors. Information and Computation, 1997.
DOI : 10.1006/inco.1996.2612

URL : http://doi.org/10.1006/inco.1996.2612

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandit problems in metric spaces, Proceedings of the 40th ACM Symposium on Theory of Computing, 2008.

R. Kleinberg, A. Slivkins, and E. Upfal, Multi-armed bandits in metric spaces, Proceedings of the fourtieth annual ACM symposium on Theory of computing, STOC 08, pp.681-690, 2008.
DOI : 10.1145/1374376.1374475

R. D. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, Advances in Neural Information Processing Systems 17, pp.697-704, 2005.

R. D. Kleinberg, Nearly tight bounds for the continuum-armed bandit problem, 2004.

L. Kocsis and C. Szepesvári, Bandit Based Monte-Carlo Planning, Proceedings of the 17th European Conference on Machine Learning (ECML- 2006), pp.282-293, 2006.
DOI : 10.1007/11871842_29

URL : http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.102.1296

V. Koltchinskii, Local Rademacher complexities and oracle inequalities in risk minimization, The Annals of Statistics, vol.34, issue.6, 2006.
DOI : 10.1214/009053606000001019

V. Koltchinskii and M. Yuan, Sparse recovery in large ensembles of kernel machines, Conference on Learning Theory, COLT, pp.229-238, 2008.

A. Lacasse, F. Laviolette, and M. Marchand, PAC-Bayesian Learning of Linear Classifiers, Proceedings of the 26th International Conference on Machine Learning, 2009.

A. Lacasse, F. Laviolette, M. Marchand, P. Germain, and N. Usunier, PAC- Bayes Bounds for the Risk of the Majority Vote and the Variance of the Gibbs Classifier, Advances in Neural Information Processing Systems, vol.19, p.769, 2007.

T. L. Lai and H. Robbins, Asymptotically efficient adaptive allocation rules, Advances in Applied Mathematics, vol.6, issue.1, pp.4-22, 1985.
DOI : 10.1016/0196-8858(85)90002-8

D. Lamberton, G. Pagès, and P. Tarrès, When can the two-armed bandit algorithm be trusted?, Annals of Applied Probability, vol.14, issue.3, pp.1424-1454, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00102253

G. R. Lanckriet, N. Cristianini, P. Bartlett, L. E. Ghaoui, and M. I. Jordan, Learning the kernel matrix with semidefinite programming An improved predictive accuracy bound for averaging classifiers, Proceedings of the 18th International Conference on Machine Learning, pp.27-72, 2001.

J. Langford and J. Shawe-taylor, PAC-Bayes & margins Advances in neural information processing systems, pp.439-446, 2003.

F. Laviolette and M. Marchand, PAC-Bayes risk bounds for stochastic averages and majority votes of sample-compressed classifiers, Journal of Machine Learning Research, vol.8, pp.1461-1487, 2007.

G. Lecué, Suboptimality of Penalized Empirical Risk Minimization in Classification, Proceedings of the 20th annual conference on Computational Learning Theory (COLT), pp.142-156, 2007.
DOI : 10.1007/978-3-540-72927-3_12

G. Lecué and S. Mendelson, Aggregation via empirical risk minimization. Probability Theory and Related Fields, pp.591-613, 2009.

W. S. Lee, P. L. Bartlett, and R. C. Williamson, The importance of convexity in learning with squared loss, Proceedings of the ninth annual conference on Computational learning theory , COLT '96, pp.1974-1980, 1998.
DOI : 10.1145/238061.238082

K. Lounici, Sup-norm convergence rate and sign concentration property of Lasso and Dantzig estimators, Electronic Journal of Statistics, vol.2, issue.0, pp.90-102, 2008.
DOI : 10.1214/08-EJS177

URL : https://hal.archives-ouvertes.fr/hal-00222251

K. Lounici, M. Pontil, A. B. Tsybakov, and S. Van-de-geer, Taking advantage of sparsity in multi-task learning, Proceedings of the 22th annual conference on Computational Learning Theory (COLT), 2009.

G. Lugosi and N. Vayatis, On the bayes-risk consistency of regularized boosting methods, Ann. Stat, vol.32, issue.1, pp.30-55, 2004.
URL : https://hal.archives-ouvertes.fr/hal-00102140

C. L. Mallows, Some comments on Cp, Technometrics, vol.15, pp.661-675, 1973.

E. Mammen and A. B. Tsybakov, Smooth discrimination analysis, Ann. Stat, vol.27, pp.1808-1829, 1999.

O. Maron and A. W. Moore, Hoeffding races: Accelerating model selection search for classification and function approximation, NIPS, pp.59-66, 1993.

P. Massart, Some applications of concentration inequalities to statistics, Annales de la facult?? des sciences de Toulouse Math??matiques, vol.9, issue.2, pp.245-303, 2000.
DOI : 10.5802/afst.961

P. Massart and E. Nédélec, Risk bounds for statistical learning, The Annals of Statistics, vol.34, issue.5, pp.2326-2366, 2006.
DOI : 10.1214/009053606000000786

URL : http://arxiv.org/abs/math/0702683

A. Maurer and M. Pontil, Empirical Bernstein Bounds and Sample Variance Penalization, stat, p.21, 1050.

D. A. Mcallester, PAC-Bayesian model averaging, Proceedings of the twelfth annual conference on Computational learning theory , COLT '99, pp.164-170, 1999.
DOI : 10.1145/307400.307435

D. A. Mcallester, Simplified PAC-Bayesian Margin Bounds, COLT: Proceedings of the Workshop on Computational Learning Theory, 2003.
DOI : 10.1007/978-3-540-45167-9_16

L. Meier, S. Van-de-geer, and P. Bühlmann, High-dimensional additive modeling, The Annals of Statistics, vol.37, issue.6B, pp.3779-3821, 2009.
DOI : 10.1214/09-AOS692

N. Meinshausen and B. Yu, Lasso-type recovery of sparse representations for high-dimensional data, The Annals of Statistics, vol.37, issue.1, pp.246-270, 2009.
DOI : 10.1214/07-AOS582

S. Mendelson, Lower bounds for the empirical minimization algorithm. Information Theory, IEEE Transactions on, vol.54, issue.8, pp.3797-3803, 2008.

N. Merhav and M. Feder, Universal prediction, IEEE Transactions on Information Theory, vol.44, issue.6, pp.2124-2147, 1998.
DOI : 10.1109/18.720534

C. A. Micchelli and M. Pontil, Learning the kernel function via regularization, Journal of Machine Learning Research, vol.6, issue.2, p.1099, 2006.

V. Mnih, C. Szepesvári, and J. Audibert, Empirical Bernstein stopping, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.672-679, 2008.
DOI : 10.1145/1390156.1390241

URL : https://hal.archives-ouvertes.fr/hal-00834983

A. Nemirovski, Lectures on probability theory and statistics. Part II: topics in Non-parametric statistics Probability summer school, Saint Flour, 1998.

C. S. Ong, A. J. Smola, and R. C. Williamson, Learning the kernel with hyperkernels, Journal of Machine Learning Research, vol.6, pp.1043-1071, 2005.

L. E. Ortiz and L. P. Kaelbling, Sampling methods for action selection in influence diagrams, AAAI/IAAI, pp.378-385, 2000.

M. R. Osborne, B. Presnell, B. A. Turlach, G. Raskutti, M. J. Wainwright et al., On the lasso and its dual Minimax rates of estimation for high-dimensional linear regression over lq-balls, Journal of Computational and Graphical Statistics, pp.319-337, 2000.

H. Robbins, Some aspects of the sequential design of experiments, Bulletin of the American Mathematical Society, vol.58, issue.5, pp.527-535, 1952.
DOI : 10.1090/S0002-9904-1952-09620-8

G. Schwarz, Estimating the dimension of a model. The annals of statistics, pp.461-464, 1978.

M. Seeger, PAC-Bayesian generalization error bounds for gaussian process classification Informatics report series EDI-INF-RR-0094, 2002.

Y. Seldin and N. Tishby, Multi-classification by categorical features via clustering, Proceedings of the 25th international conference on Machine learning, ICML '08, pp.920-927, 2008.
DOI : 10.1145/1390156.1390272

R. S. Sutton and A. G. Barto, Reinforcement Learning: An Introduction, IEEE Transactions on Neural Networks, vol.9, issue.5, 1998.
DOI : 10.1109/TNN.1998.712192

M. Talagrand, Majorizing measures: the generic chaining, The Annals of Probability, vol.24, issue.3, pp.1049-1103, 1996.
DOI : 10.1214/aop/1065725175

O. Teytaud, S. Gelly, and M. Sebag, Anytime many-armed bandit. Conférence francophone sur l'Apprentissage automatique (CAp, 2007.

R. Tibshirani, Regression shrinkage and selection via the lasso, J. Roy. Stat. Soc. B, vol.58, pp.267-288, 1994.
DOI : 10.1111/j.1467-9868.2011.00771.x

A. B. Tsybakov, Optimal Rates of Aggregation, Computational Learning Theory and Kernel Machines, pp.303-313, 2003.
DOI : 10.1007/978-3-540-45167-9_23

URL : https://hal.archives-ouvertes.fr/hal-00104867

S. A. Van-de and . Geer, High-dimensional generalized linear models and the lasso, The Annals of Statistics, vol.36, issue.2, p.614, 2008.
DOI : 10.1214/009053607000000929

V. Vapnik, Estimation of Dependences Based on Empirical Data, 1982.

V. G. Vovk, AGGREGATING STRATEGIES, COLT '90: Proceedings of the third annual workshop on Computational learning theory, pp.371-386, 1990.
DOI : 10.1016/B978-1-55860-146-8.50032-1

V. G. Vovk, A game of prediction with expert advice, Journal of Computer and System Sciences, pp.153-173, 1998.

M. J. Wainwright, Sharp Thresholds for High-Dimensional and Noisy Sparsity Recovery Using <formula formulatype="inline"><tex Notation="TeX">$\ell _{1}$</tex> </formula>-Constrained Quadratic Programming (Lasso), IEEE Transactions on Information Theory, vol.55, issue.5, pp.2183-2202, 2009.
DOI : 10.1109/TIT.2009.2016018

Y. Wang, J. Audibert, and R. Munos, Algorithms for infinitely manyarmed bandits, Advances in Neural Information Processing Systems (NIPS), pp.1729-1736, 2008.

O. Watanabe, Simple sampling techniques for discovery science, IEICE Transactions on Information and Systems, vol.1, pp.19-26, 2000.

Y. Yang, Combining Different Procedures for Adaptive Regression, Journal of Multivariate Analysis, vol.74, issue.1, pp.135-161, 2000.
DOI : 10.1006/jmva.1999.1884

URL : http://doi.org/10.1006/jmva.1999.1884

Y. Yang, Adaptive Regression by Mixing, Journal of the American Statistical Association, vol.96, issue.454, pp.574-588, 2001.
DOI : 10.1198/016214501753168262

Y. Yang, Aggregating regression procedures to improve performance, Bernoulli, vol.10, issue.1, pp.25-47, 2004.
DOI : 10.3150/bj/1077544602

R. Yaroshinsky, R. El-yaniv, and S. S. Seiden, How to Better Use Expert Advice, Machine Learning, vol.55, issue.3, pp.271-309, 2004.
DOI : 10.1023/B:MACH.0000027784.72823.e4

M. Yuan and Y. Lin, Model selection and estimation in regression with grouped variables, Journal of the Royal Statistical Society: Series B (Statistical Methodology), vol.58, issue.1, p.49, 2006.
DOI : 10.1198/016214502753479356

T. Zhang, Information-theoretic upper and lower bounds for statistical estimation, IEEE Transactions on Information Theory, vol.52, issue.4, pp.1307-1321, 2006.
DOI : 10.1109/TIT.2005.864439

T. Zhang, From ?-entropy to KL-entropy: Analysis of minimum information complexity density estimation, Annals of Statistics, vol.34, issue.5, 2007.

P. Zhao and B. Yu, On model selection consistency of Lasso, Journal of Machine Learning Research, vol.7, pp.2541-2563, 2006.