Maximize parallelism minimize overhead for nested loops via loop Striping, J. VLSI Sig. Proc. Syst, vol.47, issue.2, pp.153-167, 2007. ,
Achieving full parallelism using multi-dimensional retiming, J. IEEE Trans. Par. Dist. Syst, vol.7, issue.5, pp.1150-1163, 1996. ,
Fully parallel hardware/software codesign for multidimensional DSP applications, Proceedings of the 4th International Workshop on Hardware/Software Co-Design (CODES'96), pp.18-25, 1996. ,
DOI : 10.1109/hcs.1996.492222
URL : http://www.cse.nd.edu/~esha/papers/nelson/iwhsc96.ps
Effective Loop Partitioning and Scheduling under Memory and Register Dual Constraints, Proceedings of the conference on Design, automation and test in Europe (DATE'08). Munich(Germany), pp.1202-1207, 2008. ,
Design space minimization with timing and code size optimization for embedded DSP, Proceedings of the 1st IEEE/ACM/IFIP international conference on Hardware/software codesign & system synthesis , CODES+ISSS '03, pp.144-149, 2003. ,
DOI : 10.1145/944645.944685
Scheduling of Uniform Multi-Dimensional Systems under Resource Constraints, J. IEEE Trans. VLSI Syst, vol.6, issue.4, pp.719-730, 1998. ,
Timing optimization via nest-loop pipelining considering code size, Microprocessors and Microsystems, vol.32, issue.7, pp.351-363, 2008. ,
DOI : 10.1016/j.micpro.2008.02.002
Theoretical Constraints on Multi-Dimensional Retiming Design Techniques, Proc. Of Visual Information Processing X, pp.238-245, 2001. ,
Extended retiming: Optimal scheduling via a graphtheoretical approach, IEEE conference on the Acoustics, Speech, and Signal Processing (ICASSP'99), pp.2001-2004, 1999. ,
Retiming synchronous circuitry, Algorithmica, vol.6, pp.1-6, 1991. ,
Timing and Code Size Optimization on Achieving Full Parallelism in Uniform Nested Loop, J. of comput, vol.3, issue.7, pp.68-77, 2011. ,
Efficient retiming of large circuits, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.6, issue.1, pp.74-83, 1998. ,
DOI : 10.1109/92.661250
URL : http://www-cad.eecs.berkeley.edu/HomePages/wjiang/ee219b/sapatnekar_retime.pdf
Deriving a new efficient algorithm for min-period retiming, Proceedings of the 2005 conference on Asia South Pacific design automation , ASP-DAC '05, pp.990-993, 2005. ,
DOI : 10.1145/1120725.1120774
An efficient incremental algorithm for min-area retiming, Proceedings of the 45th annual conference on Design automation, DAC '08, pp.528-533, 2008. ,
DOI : 10.1145/1391469.1391603
URL : http://users.eecs.northwestern.edu/~haizhou/publications/dac08wang.pdf
Efficient minarea retiming of large level-clocked circuits, Proceedings Design, Automation and Test in Europe, pp.840-847, 1998. ,
DOI : 10.1109/DATE.1998.655956
URL : http://www.ee.umn.edu/users/sachin/pubhtml/../PUBS/date98.pdf
Single-dimension software pipelining for multidimensional loops, ACM Transactions on Architecture and Code Optimization, vol.4, issue.1, pp.163-174, 2007. ,
DOI : 10.1145/1216544.1216550
URL : http://www.cgo.org/cgo2004/papers/13_86_rong_h.pdf
Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model, Lecture Notes in Computer Science, vol.4959, pp.132-146, 2008. ,
DOI : 10.1007/978-3-540-78791-4_9
URL : https://link.springer.com/content/pdf/10.1007%2F978-3-540-78791-4_9.pdf
Efficient nested loop pipelining in high level synthesis using polyhedral bubble insertion, 2011 International Conference on Field-Programmable Technology, pp.1-10, 2011. ,
DOI : 10.1109/FPT.2011.6132715
URL : https://hal.archives-ouvertes.fr/hal-00746434
Outer Loop Pipelining for Application Specific Datapaths in FPGAs, IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol.16, issue.10, pp.10-1268, 2008. ,
DOI : 10.1109/TVLSI.2008.2001744
Software Pipelining of Nested Loops, Lecture Notes in Computer Science, vol.2027, pp.165-181, 2001. ,
DOI : 10.1007/3-540-45306-7_12
URL : https://link.springer.com/content/pdf/10.1007%2F3-540-45306-7_12.pdf
Code-size conscious pipelining of imperfectly nested loops, Proceedings of the 2007 workshop on MEmory performance DEaling with Applications, systems and architecture, MEDEA '07, pp.49-55, 2007. ,
DOI : 10.1145/1327171.1327177
URL : https://hal.archives-ouvertes.fr/hal-00646688
Improving performance through deep value profiling and specialization with code transformation, J. Comp. Lang. Syst. Struct, vol.37, issue.4, pp.193-203, 2011. ,
Software Pipelining in Nested Loops with Prolog-Epilog Merging, Lecture Notes in Comp. Sc, vol.18, issue.4, pp.80-94, 2009. ,
DOI : 10.1007/3-540-36579-6_2
URL : https://hal.archives-ouvertes.fr/inria-00445489
Split tiling for GPUs, Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units, GPGPU-6, pp.24-31, 2013. ,
DOI : 10.1145/2458523.2458526
URL : https://hal.archives-ouvertes.fr/hal-00786812
Loop Distribution and Fusion with Timing and Code Size Optimization, Journal of Signal Processing Systems, vol.18, issue.2, pp.325-340, 2011. ,
DOI : 10.1177/1094342004038956
Model-guided empirical tuning of loop fusion, International Journal of High Performance Systems Architecture, vol.1, issue.3, pp.183-198, 2008. ,
DOI : 10.1504/IJHPSA.2008.021798
Optimal loop parallelization for maximizing iteration-level parallelism, Proceedings of the 2009 international conference on Compilers, architecture, and synthesis for embedded systems, CASES '09, pp.67-76, 2009. ,
DOI : 10.1145/1629395.1629407
URL : http://www.cse.unsw.edu.au/~jingling/papers/cases09.pdf
A two-level scheduling method: an effective parallelizing technique for uniform nested loops on a DSP multiprocessor, Journal of Systems and Software, vol.75, issue.1-2, pp.1-2, 2005. ,
DOI : 10.1016/j.jss.2003.02.001
Iterational Retiming with Partitioning: Loop Scheduling with Complete Memory Latency Hiding, J. ACM Trans. Emb. Comp. Syst, vol.9, issue.3 22, 2010. ,
Combining extended retiming and unfolding for rate-optimal graph transformation, J. VLSI Sign. Proc, vol.39, issue.3, pp.273-293, 2005. ,
Iterative optimization in the polyhedral model: Part II, multidimensional time, Proceedings of the 2008 ACM SIGPLAN conference on Programming language design and implementation (PLDI'08), pp.90-100, 2008. ,
DOI : 10.1109/cgo.2007.21
URL : https://hal.archives-ouvertes.fr/hal-01257273
TEG: GPU Performance Estimation Using a Timing Model, 2011. ,
URL : https://hal.archives-ouvertes.fr/hal-00641726
Execution Time and Code Size Optimization Using Multidimensional Retiming and Loop Striping, 2013 Euromicro Conference on Digital System Design, pp.462-466, 2013. ,
DOI : 10.1109/DSD.2013.132
Exploring speculative procedure and loop level parallelism in SPLASH2, International Journal of High Performance Systems Architecture, vol.5, issue.2, pp.84-92 ,
DOI : 10.1504/IJHPSA.2014.061439
PRADA: a high-performance reconfigurable parallel architecture based on the dataflow model, Int. J. of High Performance Systems Architecture, vol.3, issue.1, pp.41-55, 2011. ,