References that cite/use TMG are listed below. Please let us know if you have an item you would like us to add to the list!


  1. E. Kokiopoulou and Y. Saad, "Polynomial Filtering in Latent Semantic Indexing for Information Retrieval", Proc. SIGIR'04, pp. 104-111, 2004.


  1. James Baglama and Lothar Reichel "Augmented implicitly restarted Lanczos bidiagonalization methods", SIAM J. Sci. Comput., 27(1)19-42, 2005.
  2. Coskun Bayrak, "Learning Contextual Behavior of Text Data". In Proc. 4th Int'l. Machine Learning and Applications Conf. (ICMLA'05), Dec. 2005.
  3. Colin R. Buchanan, "Semantic-based Audio Recognition and Retrieval", M.S. Thesis, School of Informatics, University of Edinburgh, 2005.
  4. Marco Lormans and Arie van Deursen, "Reconstructing Requirements Coverage Views from Design and Test using Traceability Recovery via LSI", Proc. 3rd Int'l Workshop on Traceability in Emerging Forms of Software Engineering, Long Beach, California, pp. 37 - 42, 2005.
  5. Wendy L. Martinez and Angel Martinez, "Exploratory Data Analysis With Matlab", CRC Press, 2005.
  6. Sanfeng Zhang, Guoxin Wu, Gang Chen, and Libo Xu, "On Building and Updating Distributed LSI for P2P Systems", Lecture Notes in Computer Science, v.3759, Springer, Oct. 2005.


  1. Xiang Wang and Xiaoming Jin, "Understanding and Enhancing the Folding-In Method in Latent Semantic Indexing". In "Database and Expert Systems Applications", Lecture Notes in Computer Science, v. 4080/2006, Springer, Berlin, 2006.
  2. Ioannis Antonellis, Christos Bouras and Vassilis Poulopoulos, "Personalized News Categorization through Scalable Text Classification", in Proc. of 8th Asia Pacific Web Conference: Frontiers of WWW Research and Development - APWEB 2006, pp. 391-401, Springer-Berlin, 2006.
  3. Rajeev Agrawal, William Grosky and Farshad Fotouhi, "Image Retrieval Using Multimodal Keywords". Proc. Eighth IEEE International Symposium on Multimedia (ISM'06) pp. 817-822, 2006.
  4. Demurjian, S., Rajasekaran, S., Ammar, R., Greenshields, I., Doan, T., He, L., "Applying LSI and data reduction to XML for counter terrorism", IEEE Aerospace Conference Proceedings 2006, art. no. 1656047.
  5. Daniel M. Dunlavy, Tamara G. Kolda, and W. Philip Kegelmeyer, "Multilinear algebra for analyzing data with multiple linkages", Sandia National Laboratories Technical Report SAND2006-2079, April 2006.
  6. Julia Ekstrom, "Quantifying Institutional Interplay in the California Current Large Marine Ecosystem", presented at Institutional Dimensions of Global Environmental Change (IDGEC) Synthesis Conference, Bali, Dec. 2006.
  7. Dominic Forest, "Application de techniques de forage de textes de nature predictive et exploratoire a des fins de gestion et d' analyse thematique de documents textuels non structures", Ph.D. Thesis, Universite de Quebec a Montreal, June 2006.
  8. Lars Elde'n, "Numerical linear algebra in data mining". Acta Numerica, pp. 327–384, Cambridge Univ. Press, 2006.
  9. Hyunsoo Kim and Haesun Park, "Extracting Unrecognized Gene Relationships from the Biomedical Literature via Matrix Factorizations using a Priori Knowledge of Gene Relationships". In Proc. 1st Int'l. workshop on Text mining in bioinformatics held with Conference on Information and Knowledge Management (CIKM'06), pp. 60 - 67, 2006.
  10. Hyunsoo Kim, Haesun Park, "Discriminant Analysis using Nonnegative Matrix Factorization for Nonparametric Multiclass Classification". In Proc. 2006 IEEE International Conference on Granular Computing, May 2006.
  11. Marco Lormans and Arie van Deursen, "Can LSI help Reconstructing Requirements Traceability in Design and Test?" In Proc. CSMR'06: 10th European Conference on Software Maintenance and Reengineering, Bari, Italy, 2006.
  12. Flora S. Tsai, Yun Chen, and Kap Luk Chan, "Probabilistic Latent Semantic Analysis for Search and Mining of Corporate Blogs" Technical Report EEE3/005/2006, School of Electrical and Electronic Engineering, Nanyang Technological University, pp. 1-10, 2006.
  13. Mu Zhu and Ali Ghodsi, "Automatic dimensionality selection from the scree plot via the use of profile likelihood", Computational Statistics & Data Analysis, 51, pp. 918-930, 2006.


  1. Zheng Zhao and Huan Liu, "Semi-supervised Feature Selection via Spectral Analysis". In Proc. 2007 SIAM International Conference on Data Mining. Also Technical Report, TR-06-022, Computer Science and Engineering Dept., Arizona State Univ., Tempe, AZ, 85281.
  2. Flora S. Tsai and Kap Luk Chan, "Detecting Cyber Security Threats in Weblogs Using Probabilistic Models", Intelligence and Security Informatics, Lecture Notes in Computer Science 4430/2007, Springer.
  3. Amirali Noorinaeini and Mark R. Lehto, "Hybrid singular value decomposition; a model of human text classification", Int. J. Human Factors Modelling and Simulation, 1(1):96-118, 2006.
  4. Jose Quesada, "Creating your own LSA space". In Handbook of Latent Semantic Analysis, T. Landauer, D. McNamara, S. Dennis & W. Kintsch (Eds), Routledge, 2007.
  5. Hyunsoo Kim, Haesun Park and Hongyuan Zha, "Distance Preserving Dimension Reduction for Manifold Learning". In Proc. 2007 SIAM Int'l. Conf. Data Mining (SDM'07).
  6. Hyunsoo Kim, Haesun Park and Hongyuan Zha, "Distance Preserving Dimension Reduction Using the QR Factorization or the Cholesky Factorization", Proc. 7th IEEE Int'l. Conf. Bioinformatics and Bioengineering, 2007 (BIBE 2007), Oct. 2007. pp. 263-269.
  7. Lars Elde'n, "Matrix Methods in Data Mining and Pattern Recognition", SIAM, 2007.
  8. Yun Chen, Flora S.Tsai, and Kap Luk Chan, "Blog Search and Mining in the Business Domain" 2007 ACM SIGKDD Workshop on Domain Driven Data Mining (DDDM2007), August 12, 2007, San Jose, California, USA.
  9. Rajeev Agrawal, Changhua Wu, William I. Grosky, Farshad Fotouhi, "Image Clustering Using Visual and Text Keywords". In the 7th IEEE International Symposium on Computational Intelligence in Robotics and Automation (CIRA2007), 2007.
  10. H. Jair Escalante, Carlos A. Hernandez, Aurelio Lopez, Heidy M. Marın, Manuel Montes, Eduardo Morales, Luis E. Sucar, Luis Villasenor, "TIA-INAOE’s Participation at ImageCLEF 2007", Coordinacion de Ciencias Computacionales, Instituto Nacional de Astrofısica, Optica y Electronica,Puebla, Mexico.
  11. I. Kanaris, K. Kanaris, I. Houvardas, and E. Stamatatos, "Words vs. Character N-grams for Anti-spam Filtering", Int. Journal on Artificial Intelligence Tools, World Scientific, 16(6), pp. 1047-1067, 2007.
  12. Tamsin Maxwell, "Exploring the Music Genome: Lyric Clustering with Heterogeneous Features", M.Sc. thesis, Cognitive Science and Natural Language Processing, School of Informatics, University of Edinburgh, 2007.
  13. Hans-Gerhard Gross, Marco Lormans and Jun Zhou, "Towards Software Component Procurement Automation", Delft University of Technology, Software Engineering Research Group, Technical Report TUD-SERG-2007-002, 2007. "Electronic Notes in Theoretical Computer Science" (ENTCS), 189:51-68, (July 2007).


  1. Shaina Race, "Data Clustering via Dimension Reduction and Algorithm Aggregation", Master's Thesis (Prof. Carl Meyer advisor) NCSU, Raleigh, North Carolina., 2008.
  2. Alexander Salamanca and Elizabeth Leon, "An Integrated Architecture for Personalized Query Expansion in Web Search"" AAAI Workshop, 2008.
  3. H. Jair Escalante, Carlos A. Hernandez, Luis E. Sucar, Manuel Montes, "Late fusion of heterogeneous methods for multimedia image retrieval", Proceeding of the 1st ACM international conference on Multimedia information retrieval, pp. 172-179, Vancouver, British Columbia, Canada, 2008.
  4. Taner Danisman and Adil Alpkocak, "Feeler: Emotion Classification of Text Using Vector Space Model". In AISB 2008 Convention, Communication, Interaction and Social Intelligence, vol. 2 ("Affective Language in Human and Machine"), pp. 53-59, Aberdeen, UK, April 2008.
  5. Georgina Cosma, "An Approach to Source-Code Plagiarism Detection and Investigation Using Latent Semantic Analysis", Ph.D. Thesis, University of Warwick, Department of Computer Science, July 2008.
  6. Jihong Liu and Soo-Young Lee, "Study on feature select based on coalitional game"', in Neural Networks and Signal Processing, 2008 International Conference on, 2008, 445-450.
  7. Alexandra Laflamme-Sanders and Mu Zhu, "LAGO on the unit sphere", Neural Networks 21, no. 9 (November 2008): 1220-1223.
  8. David Fritzsche, Volker Mehrmann , Daniel B. Szyld, and Elena Virnik, "An SVD approach to identifying meta-stable states of Markov chains", Research Report 06-08-4, Department of Mathematics, Temple University, August 2006 and Electron. Trans. Numer. Anal., 29: 46-69, 2008.
  9. Tobias Tornfeld, "Graph Similarity, Parallel Texts, and Automatic Bilingual Lexicon Acquisition", MS Thesis, Department of Mathematics, Linkopings Universitet, April 2008.


  1. Syed Nadeem Ahsan, Javed Ferzund and Franz Wotawa, "Automatic Software Bug Triage System (BTS) Based on Latent Semantic Indexing and Support Vector Machine". Proc. Fourth International Conference on Software Engineering Advances (ICSEA 2009), Porto, Portugal, 2009.
  2. Jie Chen and Yousef Saad, "Divide and Conquer Strategies for Effective Information Retrieval", 2009 SIAM Data Mining Conf.
  3. Jie Chen and Yousef Saad, "Lanczos Vectors versus Singular Vectors for Effective Dimension Reduction"'. IEEE Transactions on Knowledge and Data Engineering (TKDE), 21(8):1091 - 1103, 2009.
  4. Pablo de Castro et al., "Query expansion using an immune-inspired biclustering algorithm", Natural Computing,, April 2009.
  5. Claudia Marcela González, "Análisis de citación y de redes sociales para el estudio del uso de revistas en centros de investigación", Ci. Inf., Brasília, v. 38, n. 2, p. 46-55, maio/ago. 2009.
  6. Stacey M.L. Hendrickson, "The Wrong Wright Stuff: Mapping Human Error in Aviation", Ph.D. thesis, Dept. Psychology, The University of New Mexico, Albuquerque, New Mexico, May, 2009
  7. Amir Hossein Jadidinejad and Hadi Amiri, "Local Cluster Analysis as a Basis for High-Precision Information Retrieval", INFOS2008, March 27-29, 2008 Cairo-Egypt.
  8. Amir Hossein Jadidinejad and Abolfazl Toroghi Haghighat, "Local Cluster Analysis: A New Approach for Evaluating Different Document Clustering Algorithms by Huge Corpora", International Conference on Asian Langauge Processing (IALP 2008), 12-14 November 2008, Chiang Mai, Thailand.
  9. Marco Kalz, Jan van Bruggen, Bas Giesbers, Ellen Rusman, Jannes Eshuis and Wim Waterink, "A Validation Scenario for a Placement Service in Learning Networks", Learning Network Services for Professional Development, DOI 10.1007/978-3-642-00978-5_12, Springer, 2009.
  10. Inayatullah Khan, Amir Saffari, and Horst Bischof, "TVGraz: Multi-Modal Learning of Object Categories by Combining Textual and Visual Features", Proc. 33rd Workshop of the Austrian Association for Pattern Recognition, AAPR / ÖAGM 2009, pp. 213-224.
  11. Ingyu Lee, Byung-Won On, and Seong No Yoon "`Algebraic Algorithms to Solve Name Disambiguation Problem"', Int'l. Conf. Data Mining (DMIN), Las Vegas, Nevada, USA, July 13-16, 2009.
  12. Ingyu Lee and Byung-Won On, "Name Disambiguation using Multi-Level Multi-Resolution (MLMR) Graph Partitioning"', Int'l. Conf. Artificial Intelligence (ICAI), Las Vegas, Nevada, USA, July 13-16, 2009.
  13. Wim van der Vegt, Marco Kalz, Bas Giesbers, Fridolin Wild and Jan van Bruggen, "Tools and Techniques for Placement Experiments ", Learning Network Services for Professional Development, DOI 10.1007/978-3-642-00978-5_12, Springer, 2009.
  14. Yusuf Yaslan and Zehra Cataltepe, "Random Relevant and Non-redundant Feature Subspaces for Co-training", IDEAL'2009, Lecture Notes in Computer Science, 2009, Volume 5788/2009, 679-686.
  15. Quanquan Gu and Jie Zhou, "Local Relevance Weighted Maximum Margin Criterion for Text Classification", 2009 SIAM Data Mining Conf. 2009.


  1. Sunghwan Mac Kim, Alessandro Valitutti, and Rafael A. Calvo. 2010. Evaluation of unsupervised emotion models to textual affect recognition. In Proceedings of the NAACL HLT 2010 Workshop on Computational Approaches to Analysis and Generation of Emotion in Text (CAAGET '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 62-70.
  2. Yusuf Yaslan, Zehra Cataltepe, Co-training with relevant random subspaces, Neurocomputing, Volume 73, Issues 10-12, June 2010, Pages 1652-1661.
  3. Jie Chen, Numerical Linear Algebra Techniques for Effective Data Analysis, Ph.D. Thesis, University of Minnesota, Sept. 2010.
  4. Liu Hui, Cao Yonghui, "Research Intrusion Detection Techniques from the Perspective of Machine Learning," mmit, vol. 1, pp.166-168, 2010 Second International Conference on MultiMedia and Information Technology.
  5. Harish, B., Guru, D., Manjunath, S. and Dinesh, R. "Cluster Based Symbolic Representation and Feature Selection for Text Classification", in Advanced Data Mining and Applications, L. Cao et al. eds., Lecture Notes in Computer Science, vol. 6441, pp. 158-166, 2010.
  6. P. Magdalinos, "Linear and non-linear dimensionality reduction for distributed knowledge discovery", Ph.D. thesis, Athens University of Economics and Business, May 2010.
  7. E. Kokiopoulou, D. Kressner and Y. Saad, Linear dimension reduction for evolutionary data, Research Report No. 2010-42, ETH, Zurich, December 2010.
  8. Benveniste, Steven M., Investigation into Text Classification With Kernel Based Schemes, MS Thesis, Naval Postgraduate School, California, March 2010.
  9. P. de Castro, F. de França, H. Ferreira, G. Coelho and F. Von Zuben, Query expansion using an immune-inspired biclustering algorithmNatural Computing, 579-602, 9(3), 2010.
  10. Scott Hendry and Alison Madeley, "Text Mining and the Information Content of Bank of Canada Communications", Bank of Canada Working Paper 2010-31, Nov. 2010.
  11. C.A. Friedman, J. Huang and Y. Huang, "Finding Stress Scenarios that Get the Job Done, with a Credit Risk Application", Social Science Research Network, Dec. 2010.
  12. Mircea Trifan and Dan Ionescu, A new search method for ranking short text messages using semantic features and cluster coherence, Int'l. Joint Conf. Computational Cybernetics and Technical Informatics (ICCC-CONTI), 27-29 May 2010, pp. 643 - 648.
  13. Guiying Wei, Xuedong Gao and Sen Wu, Study of text classification methods for data sets with huge features, 2nd Int'l. Conf. Industrial and Information Systems (IIS), pp. 433-436, 2010, DOI 10.1109/INDUSIS.2010.5565817   
  14. Katie Wolf, "Reading Between the Tweets: Clustering and Topic Extraction on Micro-Blogs", Undergraduate Research poster, University of Minnesota, 2010.


  1. Zheng Zhao, Lei Wang, Huan Liu, Jieping Ye, "On Similarity Preserving Feature Selection," IEEE Transactions on Knowledge and Data Engineering, vol. 99, no. PrePrints, , 2011
    Zheng Zhao, Lei Wang, Huan Liu, Jieping Ye, "On Similarity Preserving Feature Selection," IEEE Transactions on Knowledge and Data Engineering, vol. 99, 2011.
  2. Nandita Tripathi, Michael Oakes and Stefan Wermter, Semantic subspace learning for text classification using hybrid intelligent techniques, International Journal of Hybrid Intelligent Systems 8 (2011) 99–114.
  3. Nandita Tripathi, Michael Oakes and Stefan Wermter, Hybrid Parallel Classifiers for Semantic Subspace LearningICANN'2011, Part II, LNCS 6792, pp. 64–70, 2011
  4. B. S. Harish, D. S. Guru, S. Manjunath, and Bapu B. Kiranagi. A symbolic approach for text classification based on dissimilarity measure. In Proc. of the First International Conference on Intelligent Interactive Technologies and Multimedia (IITM '10), M. D. Tiwari, R. C. Tripathi, and Anupam Agrawal (Eds.). ACM, New York, NY, USA, 104-108. 
  5. S. Tiwari and K. Ramanathan, Utilizing Hubel Wiesel models for semantic associations and topics extraction from unstructured text, in The 2011 International Joint Conference on Neural Networks (IJCNN), pp. 892-898.
  6. Ruichu Cai, Zhenjie Zhang, Zhifeng Hao, BASSUM: A Bayesian semi-supervised method for classification feature selection, Pattern Recognition, Volume 44, Issue 4, April 2011, Pages 811-820.
  7. I. Lee and B.-W. On, An effective web document clustering algorithm based on bisection and merge, Artificial Intelligence Review, 36(1):69-85, 2011.
  8. I. Lee and B.-W. On, Weighted hybrid features to resolve mixed entities, 6th Int'l. Conf. on Digital Inf. Mgmt. (ICDIM) 2011.
  9. Scott Deann Chen, An exploration of multimodal document classification strategies, MS Thesis in ECE, University of Illinois at Urbana-Champaign, 2011.
  10. Michele Filannino, "DBWorld e-mail classification using a very small corpus", Machine Learning final project, The University of Manchester, 2011.
  11. P. Pitchandi and N. Raju, "Improving the Performance of Multivariate Bernoulli Model based Documents Clustering Algorithms using Transformation Techniques", Journal of Computer Science, 7(5):762-769, 2011.
  12. Xiang Wang, Xiaoming Jin, Meng-En Chen, Kai Zhang, and Dou Shen, "Topic Mining over Asynchronous Text Sequences", to appear in IEEE TKDE.
  13. Sujoy Roy, Ramin Homayouni, Michael W. Berry and Andrey A. Puretskiy, "Nonnegative Tensor Factorization of Biomedical Literature for Analysis of Genomic Data", 2011 SIAM Text Mining Workshop.
  14. Flora S. Tsai, A tag-topic model for blog mining, Expert Systems with Applications Volume 38, Issue 5, May 2011, pp. 5330-5335.




  1. Stacy Rebich, "A map of Learning to Think Spatially", Final project presentation for MAT 259 (Professor George Legrady): Visualizing Information, Dept. Geography, University of California, Santa Barbara.
  2. Shaina Race, "Nonnegative factorization with sparseness constraints", Final project presentation for Prof. Carl Meyer's course "Special Topics: Web Search, Information Retrieval, and Data Mining", NCSU, Raleigh, North Carolina., Spring 2007. See also the extended class report.
  3. Alex Villacorta, "Information Diffusion of Newspaper Articles", Dept. of Sociology, University of California, Santa Barbara, 2006.
  4. Steve Vincent, "Text Extraction, Similarity and WordNet". Presentation for class "Special Topics in Data Mining Applications" taught by Prof. Carlotta Domeniconi, George Mason University.

Class and research use



  1. In Text mining and information retrieval in class "Matrix Methods in Data Mining and Pattern Recognition" taught by Prof. Lars Elde'n at Dept. Mathematics, Linko"ping University, Sweden, Nov. 2006.
  2. In COMM2A Data Mining and Knowledge Based Systems taught by Dr. Ken McGarry, Sunderland University, UK.
  3. In class CITM02 TOPIC 4, Intelligent techniques and their application in semantic web applications taught by Dr. Ken McGarry, Sunderland University, UK.
  4. "Image Annotation Group at the University of Illinois at Urbana-Champaign"
  5. [1] Janata Lab, Center for Mind and Brain University of California, Davis.
  6. In Machine Learning taught by Professor Carlotta Domeniconi, George Mason University, Fall 2007.
  7. In "Ensemble Based Systems in Decision Making", taught during Spring 2009 by Professor Carlotta Domeniconi, George Mason University.
  8. In Algorithms for Classification and Prediction taught by Professor Kevin P. Murphy, Univ. British Columbia, Spring 2007. See here.
  9. In Computer Networks taught by Professor Chyouhwa Chen, Dept. of Computer Science and Information Engineering, National Taiwan University of Science and Technology, Fall 2007.
  10. In assignment "Applied Linear Algebra" taught by Professor Tobias Damm, University of Padova and Technische Universitaet Kaiserslautern.
  11. In the [2] Spoken Word Project (Milestone Report)] prepared by Nicholas D. Lane for "`Machine Learning and Statistical Data Analysis (CS034/CS134, Spring 2009)"', taught by Lorenzo Torresani at Dartmouth College.
  12. In the student project report "An Investigation of the Latent Semantic Analysis Technique for Document Retrieval" prepared by David Muchangi Mugo under the supervision of Prof. Dr. rer. nat. Ralf Möller and Mr. A. Kaya, Technische Universitaet Hamburg, 2009.

Some of our papers utilizing TMG



  1. I. Antonellis and E. Gallopoulos, Exploring term-document matrices from matrix models in text mining. In Proc. SIAM Text Mining 2006 Workshop, held in conjunction with the 6th SIAM Int'l. Conf. Data Mining (SDM 2006, April 20-22). Also TR HPCLAB-SCG 3/02-06, CEID, University of Patras, Feb. 2006.
  2. C. Boutsidis and E. Gallopoulos, SVD based initialization: A head start for nonnegative matrix factorization, Pattern Recognition, 41(4):1350-1362, April 2008.
  3. D. Zeimpekis and E. Gallopoulos, Linear and non-linear dimensional reduction via class representatives for text classification, In Proc. of the 2006 IEEE International Conference on Data Mining (Hong Kong), December 2006, pp. 1172-1177.
  4. D. Zeimpekis and E. Gallopoulos, k-means steering of spectral divisive clustering algorithms, In Proc. of Text Mining Workshop held in conjunction with the 7th SIAM Int'l. Conf. Data Mining (SDM 2007, Minneapolis).
  5. D. Zeimpekis and E. Gallopoulos, TMG: A MATLAB toolbox for generating term document matrices from text collections, Grouping Multidimensional Data: Recent Advances in Clustering (J. Kogan, C. Nicholas, and M. Teboulle, eds.), Springer, Berlin, 2006, pp. 187-210.
  6. D. Zeimpekis and E. Gallopoulos, CLSI: A flexible approximation scheme from clustered term-document matrices, In Proc. SIAM 2005 Data Mining Conf. (Newport Beach, California) (H. Kargupta, J. Srivastava, C. Kamath, and A. Goodman, eds.), April 2005, pp. 631-635.
  7. D. Zeimpekis and E. Gallopoulos, PDDP(l): Towards a Flexing Principal Direction Divisive Partitioning Clustering Algorithms, Proc. IEEE ICDM '03 Workshop on Clustering Large Data Sets (Melbourne, Florida) (D. Boley, I. Dhillon, J. Ghosh, and J. Kogan, eds.), 2003, pp. 26-35.