- title: 'Preface' abstract: 'Preface to the Proceedings of the First Workshop on Applications of Pattern Analysis September 1-3, 2010, Cumberland Lodge, Windsor, UK' volume: 11 URL: https://proceedings.mlr.press/v11/diethe10a.html PDF: http://proceedings.mlr.press/v11/diethe10a/diethe10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-diethe10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Nello family: Cristianini - given: Tom family: Diethe - given: John family: Shawe-Taylor editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 1-3 id: diethe10a issued: date-parts: - 2010 - 9 - 30 firstpage: 1 lastpage: 3 published: 2010-09-30 00:00:00 +0000 - title: 'O-IPCAC and its Application to EEG Classification' abstract: 'In this paper we describe an online/incremental linear binary classifier based on an interesting approach to estimate the Fisher subspace. The proposed method allows to deal with datasets having high cardinality, being dynamically supplied, and it efficiently copes with high dimensional data without employing any dimensionality reduction technique. Moreover, this approach obtains promising classification performance even when the cardinality of the training set is comparable to the data dimensionality. We demonstrate the efficacy of our algorithm by testing it on EEG data. This classification problem is particularly hard since the data are high dimensional, the cardinality of the data is lower than the space dimensionality, and the classes are strongly unbalanced. The promising results obtained in the MLSP competition, without employing any feature extraction/selection step, have demonstrated that our method is effective; this is further proved both by our tests and by the comparison with other well-known classifiers.' volume: 11 URL: https://proceedings.mlr.press/v11/rozza10a.html PDF: http://proceedings.mlr.press/v11/rozza10a/rozza10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-rozza10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Alessandro family: Rozza - given: Gabriele family: Lombardi - given: Marco family: Rosa - given: Elena family: Casiraghi editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 4-11 id: rozza10a issued: date-parts: - 2010 - 9 - 30 firstpage: 4 lastpage: 11 published: 2010-09-30 00:00:00 +0000 - title: '$\mu$TOSS - Multiple hypothesis testing in an open software system' abstract: '$\mu$TOSS is an R package providing an open source, easy-to-extend platform for multiple hypothesis testing (MHT), one of the most active research fields in statistics over the last 10-15 years. Its first motivation is to establish a common platform and standardization for MHT procedures at large. The $\mu$TOSS software has been designed and written in the framework of a “Harvest Programme” call of the PASCAL2 European research network. Basically, it consists of the two R packages mutoss and mutossGUI. For researchers, it features a convenient unification of interfaces for MHT procedures (including standardized functions to access existing specific MHT R packages such as multtest and multcomp, as well as recent MHT procedures that are not available elsewhere) and helper functions facilitating the setup of benchmark simulations for comparison of competing methods. For end users, a graphical user interface and an online user''s guide for finding appropriate methods for a given specification of the multiple testing problem is included. Ongoing maintenance and subsequent extensions will aim at establishing $\mu$TOSS as a state of the art in statistical computing for MHT.' volume: 11 URL: https://proceedings.mlr.press/v11/blanchard10a.html PDF: http://proceedings.mlr.press/v11/blanchard10a/blanchard10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-blanchard10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Gilles family: Blanchard - given: Thorsten family: Dickhaus - given: Niklas family: Hack - given: Frank family: Konietschke - given: Kornelius family: Rohmeyer - given: Jonathan family: Rosenblatt - given: Marsel family: Scheer - given: Wiebke family: Werft editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 12-19 id: blanchard10a issued: date-parts: - 2010 - 9 - 30 firstpage: 12 lastpage: 19 published: 2010-09-30 00:00:00 +0000 - title: 'SubSift: a novel application of the vector space model to support the academic research process' abstract: 'SubSift matches submitted conference or journal papers to potential peer reviewers based on the similarity between the paper''s abstract and the reviewer''s publications as found in online bibliographic databases such as Google Scholar. Using concepts from information retrieval including a bag-of-words representation and cosine similarity, the SubSift tools were originally created to streamline the peer review process for the ACM SIGKDD''09 data mining conference. This paper describes how these tools were subsequently developed and deployed in the form of web services designed to support not only peer review but also personalised data discovery and mashups. SubSift has already been used by several major data mining conferences and interesting applications in other fields are now emerging.' volume: 11 URL: https://proceedings.mlr.press/v11/price10a.html PDF: http://proceedings.mlr.press/v11/price10a/price10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-price10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Simon family: Price - given: Peter A. family: Flach - given: Sebastian family: Spiegler editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 20-27 id: price10a issued: date-parts: - 2010 - 9 - 30 firstpage: 20 lastpage: 27 published: 2010-09-30 00:00:00 +0000 - title: 'Cross-associating unlabelled timbre distributions to create expressive musical mappings' abstract: 'In timbre remapping applications such as concatenative synthesis, an audio signal is used as a template, and a mapping process derives control data for some audio synthesis algorithm such that it produces a new audio signal approximating the perceived trajectory of the original sound. Timbre is a multidimensional attribute with interactions between dimensions, and the control and synthesised signals typically represent sounds with different timbral ranges, so it is non-trivial to design a search process which makes best use of the timbral variety available in the synthesiser. We first discuss our preliminary work applying standard machine-learning techniques for this purpose (PCA, self-organising maps), and the reasons they were not satisfactory. We then describe a novel regression-tree technique which learns associations between unlabelled multidimensional timbre distributions.' volume: 11 URL: https://proceedings.mlr.press/v11/stowell10a.html PDF: http://proceedings.mlr.press/v11/stowell10a/stowell10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-stowell10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Dan family: Stowell - given: Mark D. family: Plumbley editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 28-35 id: stowell10a issued: date-parts: - 2010 - 9 - 30 firstpage: 28 lastpage: 35 published: 2010-09-30 00:00:00 +0000 - title: 'Automating News Content Analysis: An Application to Gender Bias and Readability' abstract: 'In this article we present an application of text-analysis technologies to support social science research, in particular the analysis of patterns in news content. We describe a system that gathers and annotates large volumes of textual data in order to extract patterns and trends. We have examined 3.5 million news articles and show that their topic is related to the gender bias and readability of their content. This study is intended to illustrate how pattern analysis technology can be deployed to automate tasks commonly performed by humans in the social sciences, in order to enable large scale studies that would otherwise be impossible.' volume: 11 URL: https://proceedings.mlr.press/v11/ali10a.html PDF: http://proceedings.mlr.press/v11/ali10a/ali10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-ali10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Omar family: Ali - given: Ilias family: Flaounas - given: Tijl De family: Bie - given: Nick family: Mosdell - given: Justin family: Lewis - given: Nello family: Cristianini editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 36-43 id: ali10a issued: date-parts: - 2010 - 9 - 30 firstpage: 36 lastpage: 43 published: 2010-09-30 00:00:00 +0000 - title: 'MOA: Massive Online Analysis, a Framework for Stream Classification and Clustering' abstract: 'Massive Online Analysis (MOA) is a software environment for implementing algorithms and running experiments for online learning from evolving data streams. MOA is designed to deal with the challenging problem of scaling up the implementation of state of the art algorithms to real world dataset sizes. It contains collection of offline and online for both classification and clustering as well as tools for evaluation. In particular, for classification it implements boosting, bagging, and Hoeffding Trees, all with and without Naive Bayes classifiers at the leaves. For clustering, it implements StreamKM++, CluStream, ClusTree, Den-Stream, D-Stream and CobWeb. Researchers benefit from MOA by getting insights into workings and problems of different approaches, practitioners can easily apply and compare several algorithms to real world data set and settings. MOA supports bi-directional interaction with WEKA, the Waikato Environment for Knowledge Analysis, and is released under the GNU GPL license.' volume: 11 URL: https://proceedings.mlr.press/v11/bifet10a.html PDF: http://proceedings.mlr.press/v11/bifet10a/bifet10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-bifet10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Albert family: Bifet - given: Geoff family: Holmes - given: Bernhard family: Pfahringer - given: Philipp family: Kranen - given: Hardy family: Kremer - given: Timm family: Jansen - given: Thomas family: Seidl editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 44-50 id: bifet10a issued: date-parts: - 2010 - 9 - 30 firstpage: 44 lastpage: 50 published: 2010-09-30 00:00:00 +0000 - title: 'Pinview: Implicit Feedback in Content-Based Image Retrieval' abstract: 'This paper describes Pinview, a content-based image retrieval system that exploits implicit relevance feedback during a search session. Pinview contains several novel methods that infer the intent of the user. From relevance feedback, such as eye movements or clicks, and visual features of images Pinview learns a similarity metric between images which depends on the current interests of the user. It then retrieves images with a specialized reinforcement learning algorithm that balances the tradeoff between exploring new images and exploiting the already inferred interests of the user. In practise, we have integrated Pinview to the content-based image retrieval system PicSOM, in order to apply it to real-world image databases. Preliminary experiments show that eye movements provide a rich input modality from which it is possible to learn the interests of the user.' volume: 11 URL: https://proceedings.mlr.press/v11/auer10a.html PDF: http://proceedings.mlr.press/v11/auer10a/auer10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-auer10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Peter family: Auer - given: Zakria family: Hussain - given: Samuel family: Kaski - given: Arto family: Klami - given: Jussi family: Kujala - given: Jorma family: Laaksonen - given: Alex P. family: Leung - given: Kitsuchart family: Pasupa - given: John family: Shawe-Taylor editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 51-57 id: auer10a issued: date-parts: - 2010 - 9 - 30 firstpage: 51 lastpage: 57 published: 2010-09-30 00:00:00 +0000 - title: 'Handwritten Text Recognition for Ancient Documents' abstract: 'Huge amounts of legacy documents are being published by on-line digital libraries world wide. However, for these raw digital images to be really useful, they need to be transcribed into a textual electronic format that would allow unrestricted indexing, browsing and querying. In some cases, adequate transcriptions of the handwritten text images are already available. In this work three systems are presented to deal with this sort of documents. The first two address two different approaches for semi-automatic transcription of document images. The third system implements an alignment method to find mappings between word images of a handwritten document and their respective words in its given transcription.' volume: 11 URL: https://proceedings.mlr.press/v11/juan10a.html PDF: http://proceedings.mlr.press/v11/juan10a/juan10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-juan10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Alfons family: Juan - given: Verónica family: Romero - given: Joan Andreu family: Sánchez - given: Nicolás family: Serrano - given: Alejandro H. family: Toselli - given: Enrique family: Vidal editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 58-65 id: juan10a issued: date-parts: - 2010 - 9 - 30 firstpage: 58 lastpage: 65 published: 2010-09-30 00:00:00 +0000 - title: 'Assessment of Cow’s Body Condition Score Through Statistical Shape Analysis and Regression Machines' abstract: 'This study explores the feasibility of estimating the Body Condition Score (BCS) of cows from digital images by employing statistical shape analysis and regression machines. The shapes of body cows are described through a number of variations from a unique average shape. Specifically, Kernel Principal Component Analysis is used to determine the components describing the many ways in which the body shape of different cows tend to deform from the average shape. This description is used for automatic estimation of BCS through regression approach. The proposed method has been tested on a new benchmark dataset available through the Internet. Experimental results confirm the effectiveness of the proposed technique that outperforms the state-of-the-art approaches proposed in the context of dairy cattle research.' volume: 11 URL: https://proceedings.mlr.press/v11/battiato10a.html PDF: http://proceedings.mlr.press/v11/battiato10a/battiato10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-battiato10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Sebastiano family: Battiato - given: Giovanni Maria family: Farinella - given: Giuseppe Claudio family: Guarnera - given: Giovanni family: Puglisi - given: Giuseppe family: Azzaro - given: Margherita family: Caccamo editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 66-73 id: battiato10a issued: date-parts: - 2010 - 9 - 30 firstpage: 66 lastpage: 73 published: 2010-09-30 00:00:00 +0000 - title: 'Closure-Based Confidence Boost in Association Rules' abstract: 'We focus on association rule mining. It is well-known that naive miners end up often providing far too large amounts of mined associations to result actually useful in practice. Many proposals exist for selecting appropriate association rules, trying to measure their interest in various ways; most of these approaches are statistical in nature, or share their main traits with statistical notions. Alternatively, some existing notions of redundancy among association rules allow for a logical-style characterization and lead to irredundant bases (axiomatizations) of absolutely minimum size. Here we follow up on a study of closure-based redundancy, which, in practice, leads to smaller bases than simpler alternative forms of redundancy, with the proviso that, in principle, they need to be complemented with an implicational basis. One can push the intuition of redundancy further and gain a perspective of the interest of association rules in terms of their “novelty” with respect to other rules. An irredundant rule is so because its confidence is higher than what the rest of the rules would suggest; then, one can ask: how much higher? Among several variants, a recently proposed parameter, the confidence boost, succeeds in measuring a notion of novelty along these lines so that it fits better the needs of practical applications. However, that notion is based on plain redundancy, of relatively limited practical usefulness. Here we extend the confidence boost to closure-based redundancy, paying a small theoretical price to obtain several advantages in practical applications. We describe a rule-mining system implementing this contribution.' volume: 11 URL: https://proceedings.mlr.press/v11/balcazar10a.html PDF: http://proceedings.mlr.press/v11/balcazar10a/balcazar10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-balcazar10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: José L. family: Balcázar editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 74-80 id: balcazar10a issued: date-parts: - 2010 - 9 - 30 firstpage: 74 lastpage: 80 published: 2010-09-30 00:00:00 +0000 - title: 'HMMPayl: an application of HMM to the analysis of the HTTP Payload' abstract: 'Zero-days attacks are one of the most dangerous threats against computer networks. These, by definition, are attacks never seen before. Thus, defense tools based on a database of rules (usually referred as “signatures”) that describe known attacks cannot do anything against them. Recently, defense tools based on machine learning algorithms have gained an increasing popularity as they offer the possibility to fight off also zero-days attacks. In this paper we propose HMMPayl, an anomaly based Intrusion Detection System for the protection of a web server and of the applications the server hosts. HMMPayl analyzes the network traffic toward the web server and it is based on Hidden Markov Models. With this paper we provide for several contributions. First, the algorithm implemented by HMMPayl allows to carefully model the payload increasing the classification accuracy with respect to previously proposed solutions. Second, we show that an approach based on multiple classifiers leads to an increased classification accuracy with respect to the case where a single classifier is used. Third, exploiting the redundancy within the information extracted from the payload we propose a solution to reduce the computational cost of the algorithm.' volume: 11 URL: https://proceedings.mlr.press/v11/ariu10a.html PDF: http://proceedings.mlr.press/v11/ariu10a/ariu10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-ariu10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Davide family: Ariu - given: Giorgio family: Giacinto editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 81-87 id: ariu10a issued: date-parts: - 2010 - 9 - 30 firstpage: 81 lastpage: 87 published: 2010-09-30 00:00:00 +0000 - title: 'Maximum Margin Learning with Incomplete Data: Learning Networks instead of Tables' abstract: 'In this paper we address the problem of predicting when the available data is incomplete. We show that changing the generally accepted table-wise view of the sample items into a graph representable one allows us to solve these kind of problems in a very concise way by using the well known convex, one-class classification based, optimisation framework. The use of the one-class formulation in the learning phase and in the prediction as well makes the entire procedure highly consistent. The graph representation can express the complex interdependencies among the data sources. The underlying optimisation problem can be transformed into a on-line algorithm, e.g. a perceptron type one, and in this way it can deal with data sets of million items. This framework covers and encompasses supervised, semi-supervised and some unsupervised learning problems. Furthermore, the data sources can be chosen as not only simple binary variables or vectors but text documents, images or even graphs with complex internal structures.' volume: 11 URL: https://proceedings.mlr.press/v11/szedmak10a.html PDF: http://proceedings.mlr.press/v11/szedmak10a/szedmak10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-szedmak10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Sandor family: Szedmak - given: Yizhao family: Ni - given: Steve R. family: Gunn editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 96-102 id: szedmak10a issued: date-parts: - 2010 - 9 - 30 firstpage: 96 lastpage: 102 published: 2010-09-30 00:00:00 +0000 - title: 'Interactive Pattern Recognition and Human Language Technology for Digital Audiovisual Content Processing' abstract: 'This paper describes ongoing research work by the Pattern Recognition and Human Language Technology (PRHLT) group (UPV PASCAL2 node) in two important technology transfer projects: i3media and erudito.com. On the one hand, i3media (2007-2010) is a 35M€ “tractor” technology project within the Spanish Programa CENIT-Ingenio 2010, run through a consortium of 12 main enterprises of the media sector, which also involve 19 research groups, including PRHLT. i3media focuses on the creation and automated management of intelligent audiovisual content, so as to facilitate both, content personalisation and interaction with users (i3media.barcelonamedia.org). Our participation in i3media is centred on interactive machine translation, to transfer and adapt our experience on this technology to i3media-specific needs. On the other hand, erudito.com (2010-2012) is a 1.4M€ experimental design project, supported by the Spanish Ministry of Industry, Tourism and Trade under the Avanza I+D program, aimed at developing a tool to encapsulate, distribute and intelligently use digital content such as that showed on thematic TV channels. In this project, PRHLT contributes to the development of interactive closed captioning (speech transcription) and machine translation tools.' volume: 11 URL: https://proceedings.mlr.press/v11/lagarda10a.html PDF: http://proceedings.mlr.press/v11/lagarda10a/lagarda10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-lagarda10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Antonio family: Lagarda - given: Jorge family: Civera - given: Alfons family: Juan - given: Francisco family: Casacuberta editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 103-110 id: lagarda10a issued: date-parts: - 2010 - 9 - 30 firstpage: 103 lastpage: 110 published: 2010-09-30 00:00:00 +0000 - title: 'Facial Expression Detection using Filtered Local Binary Pattern Features with ECOC Classifiers and Platt Scaling' abstract: 'We outline a design for a FACS-based facial expression recognition system and describe in more detail the implementation of two of its main components. Firstly we look at how features that are useful from a pattern analysis point of view can be extracted from a raw input image. We show that good results can be obtained by using the method of local binary patterns (LPB) to generate a large number of candidate features and then selecting from them using fast correlation-based filtering (FCBF). Secondly we show how Platt scaling can be used to improve the performance of an error-correcting output code (ECOC) classifier.' volume: 11 URL: https://proceedings.mlr.press/v11/smith10a.html PDF: http://proceedings.mlr.press/v11/smith10a/smith10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-smith10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Raymond S. family: Smith - given: Terry family: Windeatt editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 111-118 id: smith10a issued: date-parts: - 2010 - 9 - 30 firstpage: 111 lastpage: 118 published: 2010-09-30 00:00:00 +0000 - title: 'Gap Between Theory and Practice: Noise Sensitive Word Alignment in Machine Translation' abstract: 'Word alignment is to estimate a lexical translation probability \emphp(\emphe|\emphf), or to estimate the correspondence \emphg(\emphe,\emphf) where a function \emphg outputs either 0 or 1, between a source word \emphf and a target word \emphe for given bilingual sentences. In practice, this formulation does not consider the existence of ’noise’ (or outlier) which may cause problems depending on the corpus. \emphN-to-\emphm mapping objects, such as paraphrases, non-literal translations, and multi-word expressions, may appear as both noise and also as valid training data. From this perspective, this paper tries to answer the following two questions: 1) how to detect stable patterns where noise seems legitimate, and 2) how to reduce such noise, where applicable, by supplying extra information as prior knowledge to a word aligner.' volume: 11 URL: https://proceedings.mlr.press/v11/okita10a.html PDF: http://proceedings.mlr.press/v11/okita10a/okita10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-okita10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Tsuyoshi family: Okita - given: Yvette family: Graham - given: Andy family: Way editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 119-126 id: okita10a issued: date-parts: - 2010 - 9 - 30 firstpage: 119 lastpage: 126 published: 2010-09-30 00:00:00 +0000 - title: 'Modeling Knowledge Worker Activity' abstract: 'This paper describes an approach to constructing a probabilistic process model representing knowledge worker activity out of a log of primitive events, such as e-mails, web page visits and document accesses. Firstly, we present the process of enriching the primitive events into abstract actions, executed in different contexts. We explain the process of obtaining both context and action for each event by clustering the events via two different views. Secondly, we present an application of probabilistic deterministic finite automata to model the transitions between consecutive actions within the same context and demonstrate the approach on real-world knowledge worker data for the purpose of understanding knowledge processes and demonstrating the feasibility of the proposed approach, where a process model is constructed out of low-level events.' volume: 11 URL: https://proceedings.mlr.press/v11/stajner10a.html PDF: http://proceedings.mlr.press/v11/stajner10a/stajner10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-stajner10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Tadej family: Štajner - given: Dunja family: Mladeniƈ editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 127-133 id: stajner10a issued: date-parts: - 2010 - 9 - 30 firstpage: 127 lastpage: 133 published: 2010-09-30 00:00:00 +0000 - title: 'Visualization of Online Discussion Forums' abstract: 'This paper describes a set of visualization tools which aid the understanding of discussion topics and trends in online discussion forums. The tools integrate into the forum’s web page, allowing for easy exploration of its contents. Three visualizations are presented: a visual browsing suggestions mechanism, a semantic “atlas” providing a thematic overview of larger forum segments, and a timeline displaying temporal evolution of forum topics. The underlying algorithms have very few language-dependent components. The software is operational and can be tested live on Slovene, Slovak and Hungarian pilot sites, containing up to 5 million forum posts.' volume: 11 URL: https://proceedings.mlr.press/v11/trampus10a.html PDF: http://proceedings.mlr.press/v11/trampus10a/trampus10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-trampus10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Mitja family: Trampuš - given: Marko family: Grobelnik editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 134-141 id: trampus10a issued: date-parts: - 2010 - 9 - 30 firstpage: 134 lastpage: 141 published: 2010-09-30 00:00:00 +0000 - title: 'A Novel Hybrid Feature Selection Method Based on IFSFFS and SVM for the Diagnosis of Erythemato-Squamous Diseases' abstract: 'This paper developed a diagnosis model based on Support Vector Machines (SVM) with a novel hybrid feature selection method to diagnose erythemato-squamous diseases. Our hybrid feature selection method, named IFSFFS (Improved F-score and Sequential Forward Floating Search), combines the advantages of filters and wrappers to select the optimal feature subset from the original feature set. In our IFSFFS, we firstly generalized the original F-score to the improved F-score measuring the discrimination of more than two sets of real numbers. Then we proposed to combine Sequential Forward Floating Search (SFFS) and our improved F-score to accomplish the optimal feature subset selection. Where, our improved F-score is an evaluation criterion for filters, while SFFS and SVM compose an evaluation system of wrappers. The best parameters of kernel function of SVM are found out by grid search technique with ten-fold cross validation. Experiments have been conducted on five random training-test partitions of the erythemato-squamous diseases dataset from UCI machine learning database. The experimental results show that our SVM-based model with IFSFFS achieved the optimal classification accuracy with no more than 14 features as well.' volume: 11 URL: https://proceedings.mlr.press/v11/xie10a.html PDF: http://proceedings.mlr.press/v11/xie10a/xie10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-xie10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Juanying family: Xie - given: Weixin family: Xie - given: Chunxia family: Wang - given: Xinbo family: Gao editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 142-151 id: xie10a issued: date-parts: - 2010 - 9 - 30 firstpage: 142 lastpage: 151 published: 2010-09-30 00:00:00 +0000 - title: 'Learning to Rank for Personalized News Article Retrieval' abstract: 'This paper aims to tackle the very interesting and important problem of user personalized ranking of search results. The focus is on news retrieval and the data from which the ranking model is learned was provided by a large online newspaper. The personalized news search ranking model which we have developed takes into account not only document content and metadata, but also data specific to the user such as age, gender, job, income, city, country etc. All the user specific data is provided by the user himself when registering to the news site.' volume: 11 URL: https://proceedings.mlr.press/v11/dali10a.html PDF: http://proceedings.mlr.press/v11/dali10a/dali10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-dali10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Lorand family: Dali - given: Blaž family: Fortuna - given: Jan family: Rupnik editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 152-159 id: dali10a issued: date-parts: - 2010 - 9 - 30 firstpage: 152 lastpage: 159 published: 2010-09-30 00:00:00 +0000 - title: 'Detection of Server-side Web Attacks' abstract: 'Web servers and server-side applications constitute the key components of modern Internet services. We present a pattern recognition system to the detection of intrusion attempts that target such components. Our system is anomaly-based, i.e., we model the normal (legitimate) traffic and intrusion attempts are identified as anomalous traffic. In order to address the presence of attacks (noise) inside the training set we employ an ad-hoc outlier detection technique. This approach does not require supervision and allows us to accurately detect both known and unknown attacks against web services.' volume: 11 URL: https://proceedings.mlr.press/v11/corona10a.html PDF: http://proceedings.mlr.press/v11/corona10a/corona10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-corona10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Igino family: Corona - given: Giorgio family: Giacinto editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 160-166 id: corona10a issued: date-parts: - 2010 - 9 - 30 firstpage: 160 lastpage: 166 published: 2010-09-30 00:00:00 +0000 - title: 'Multiple Kernel Learning on the Limit Order Book' abstract: 'Simple features constructed from order book data for the EURUSD currency pair were used to construct a set of kernels. These kernels were used both individually and simultaneously through the Multiple Kernel Learning (MKL) methods of SimpleMKL and the more novel LPBoostMKL to train multiclass Support Vector Machines to predict the direction of future price movements. The kernel methods outperformed a trend following benchmark both in their predictive ability and when used in a simple trading rule. Furthermore, the kernel weightings selected by the MKL techniques highlight which features of the EURUSD order book are the most informative for predictive tasks.' volume: 11 URL: https://proceedings.mlr.press/v11/fletcher10a.html PDF: http://proceedings.mlr.press/v11/fletcher10a/fletcher10a.pdf edit: https://github.com/mlresearch//v11/edit/gh-pages/_posts/2010-09-30-fletcher10a.md series: 'Proceedings of Machine Learning Research' container-title: 'Proceedings of the First Workshop on Applications of Pattern Analysis' publisher: 'PMLR' author: - given: Tristan family: Fletcher - given: Zakria family: Hussain - given: John family: Shawe-Taylor editor: - given: Tom family: Diethe - given: Nello family: Cristianini - given: John family: Shawe-Taylor address: Cumberland Lodge, Windsor, UK page: 167-174 id: fletcher10a issued: date-parts: - 2010 - 9 - 30 firstpage: 167 lastpage: 174 published: 2010-09-30 00:00:00 +0000