The NTU Information Engineering Team
Flourished in ACMSIGKDD 2010 by Winning the Big Awards
A research paper, entitled "Large Linear Classification When Data Cannot Fit in Memory," and co-authored by NTU Department of Computer Science and Information Engineering Professor Dr. Chih-Jen Lin and students Hsiang Fu Yu, Cho-Jui Hsieh, and Kai-Wei Wang, participated in the Sixteenth ACM SIGKDD Conference on Knowledge Discovery and Date Minining (KDD 2010, the world's leading international Conference on data mining) held in Washington D.C., from July 25th to 28th , emerged triumphant from among 578 entries and won the "Best Research Paper Award." This marked the first time that a team from Taiwan won this prestigious award. In addition, Professor Chih-Jen Lin, along with Professor Hsuan-Tien Lin and Professor Shou-De Lin from the same Department, led a research team and participated in the KDD Cup 2010 organized by the same Conference (KDD Cup was the world's most important data mining competition), and won the double championship in the student group and the general group from over 100 participating teams. This was the third time that Taiwan's achievements became the envy of the world (2008 the World's No. 1, and 2009 the 3rd place in the world). Winning these awards confirms National Taiwan University's Information Engineering Team's excellence and leadership in the field of machine learning and data mining research.
What is the KDD Conference?
The KDD Conference is the top conference for the data mining field. Every year it was organized by ACM, the most authoritative organization in computer science, and participated by scholars and experts from academia, industry and governmental agencies who publish the most forward-looking results of the problems encountered in data mining. Every year over 1,000 scholars from the U.S., Europe, Australia, China, Japan, and Taiwan participate in the Conference. All the published papers were called for in an open process, and had to undergo an anonymous screening before they are selected. This year, there were 578 manuscripts submitted, but only 101 papers were selected, making the admission rate at 17.4%. In addition to Dr. Chih-Jen Lin's team winning the "Best Research Paper Award," Professor Ming-Syan Chen of the Department of Electrical Engineering also had two papers published, and there was yet another paper from Taiwan published by Professor Shin-Mu Vincent Tseng of the National Cheng-Kung University. According to the statistics tallied by the Conference, having four papers published makes Taiwan's rate of acceptance the second best in the world, next only to Australia.
Large Linear Classification When Date Cannot Fit in Memory
The problem of linear classification has important applications in document classification and large scale data analyses over the internet. However, when information overload, the existing linear methods encounter serious computational bottlenecks. Professor Lin's research team proposed a new framework that combines theoretical analysis with practical design to address
the bottleneck, so that an ordinary computer user can easily solve the problemof linear classification for data over 100 GB on a personal computer. In the open award presentation ceremony, judges of the Best Research Paper Award said of Professor Lin's paper: "good combination of theory ideas and engineering ideas and a solid evaluation for a very relevant problem," and, "addresses a central task that is specifically a KDD task. Impressive results on large data," and, finally, "can be proved really useful to the community on a wide spectrum of problems."
What is the KDD Cup competition?
The KDD Cup is the most important annual competition for data mining practitioners all over the world. Ever since 1997, the competition was held along with the KDD Conference. Participants in the competition have to design all kinds of data mining methods to analyze the large scale, real-life data provided by the organizers so as to create better systems of making intelligent decisions. The purpose of the competition is not only to enhance the technical level of data mining, but to induce the participating teams to combine the theories of data mining with practical issues, so that real problems in our lives can be solved. For this year's competition, the problem given was the data analysis of an on-line math learning system. The participating teams had to analyze over 30 million historical records of students learning math on line, and ascertain the knowledge content that a student has learned, so as to judge the possibility of a student answering a particular math problem correctly. The KDD Cup has attracted many important teams from academia and industry to participate every year, and NTU teams seemed to be able to stay ahead of the game in recent years. In 2008, Professor Shou-de Lin led a team which successfully designed an intelligent breast cancer diagnostic system, and his entry was ranked No.1 along with IBM Research for that year. In 2009, a team jointly led by Professor Shou-de Lin, Professor Hsuan -Tien Lin, and Professor Chih-Jen Lin, analyzed the commercial data of cell phone sales, and succeeded in correctly predicting the behavioral pattern of consumers. Their entry won the 3rd prize in the "long term analysis" group for that year. This year, yet another team led by the three professors, based their research on "Feature Engineering and Classifier Ensemble", effectively mined the data provided by the organizer on learning systems, and won the double championship in student group and general group.
Feature Engineering and Classifier Ensemble
Chih-Jen Lin tried to jointly open a new practice-type course called "data mining and machine learning theory and practice." This course used data from KDD Cup and other related contests as the platform, so that students can apply their data mining and machine learning methods with data obtained from real life. In this year's
class, the 19 attending students were divided into six groups, with each group vying to come up with innovative ideas that can be said to be typical of NTU students. For instance, some groups tried to combine the knowledge content from the "big problems" of the data with the "small steps" of the systematic answers, so as to construct data characteristics that are more suitable for forecasting; while other groups tried to calculate the rate of correctness of students when they face different problems at different times and use the rates as characteristic values for gauging the students' degree of understanding. At the final stage of student contest, the two teaching assistants, Hsiang-Fu Yu and Hong-Yi Lo used feature engineering and classifier ensemble algorithm to integrate all students' classification systems. As the integrated system combines a variety of ideas from each group, therefore it is more prone to have excellent performance. Under the support from the College of Electrical Engineering and Computer Science and the Department of Computer Science and Information Engineering, the three professors led team members Hsiang-Fu Yu, Chia-Hua Ho, Tao-De Mai, and Earn-Hsu Yen and received public citation at the luncheon hosted by the Conference. Professor Chih-Jen Lin gave a speech on behalf of the NTU team, while the student members interacted with international academia and industry through the display of posters. All in all, the NTU delegation received a high degree of recognition from international scholars.