Toward Recommendation Based on Ontology-Powered Web-Usage Mining

of 8
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Information Report
Category:

Psychology

Published:

Views: 18 | Pages: 8

Extension: PDF | Download: 0

Share
Description
Toward Recommendation Based on Ontology-Powered Web-Usage Mining
Tags
Transcript
  Mehdi Addaand Petko Valtchev University of Montreal  Rokia Missaoui University of Quebec in Outaouais Chabane Djeraba University of Lille,France JULY • AUGUST 20071089-7801/07/$25.00 © 2007 IEEEPublished by the IEEEComputer Society45    D  a   t  a   M   i  n   i  n  g Toward RecommendationBased on Ontology-Powered Web-Usage Mining Content adaptation on the Web reduces available information to a subset thatmatches a user’s anticipated needs. Recommender systems rely on relevancescores for individual content items; in particular, pattern-based recommendationexploits co-occurrences of items in user sessions to ground any guesses aboutrelevancy. To enhance the discovered patterns’ quality, the authors propose usingmetadata about the content that they assume is stored in a domain ontology.Their approach comprises a dedicated pattern space built on top of the ontology,navigation primitives, mining methods, and recommendation techniques. T oday, Web users are submerged inall kinds of available information, yet only a tiny part of it is usually relevant to their preferences. Content adaptation 1 aims to guide users towarditems of interest while reducing the pro-posed content’s quantity and variety.Recommender systems perform activeadaptation by matching content objectsagainst a user’s anticipated needs basedon relevance. Such a system approxi-mates the unknown object-to-user rele- vance from information about similar users or objects, whereas pattern miningrelies on an analysis of past experiencesregistered in user logs to predict the best“next” item.Our association-based recommenda-tion approach uses click-stream analysis 2 powered by an explicit representation of domain knowledge. We hypothesize thatsuch knowledge should help increaseboth the relevance and interpretability of discovered patterns. Thus, we assume adomain ontology — in this article, e-tourism — that describes content objectsas instances of generic categories (such ascities, museums, beaches, and so on) with various relations (such as location, prox-imity, and so on). Consequently, insteadof raw click-streams, we process content-object sequences enhanced with seman-tic links. The discovered patterns arethemselves sequences of domain con-  cepts; the system uses them in the recommenda-tion process as plausible scenarios to matchagainst ongoing user sessions.The resulting mining problem combines aspectsof structured  and generalized  pattern discovery,which, to the best of our knowledge, haven’t beenstudied before in this context. The problem’s key difficulty is the patterns’ structural complexity,which reflects both the ontology’s is-a hierarchy and its network of properties. A previous work  3 introduced a plausible solution’s key elements —namely, a pattern space and an algorithm thatmines it. This article describes a complete recom-mendation framework and provides empirical evi-dence about the mining method’s robustness. Pattern-Based Recommendation Pattern-based recommendation uses the patternsidentified by mining previous user session logs. By tracing these patterns, the system can guess, withhigher probability, a user’s “next step” and hencesuggest relevant content. To that end, it matches thesequence of content objects already visited by theuser against the pattern set and finds all patternshaving a prefix corresponding to that sequence. Theremainder of such a pattern is a hint as to the way the session might unfold. Pattern Mining Given a universe of items O , a database D of records that combine items from O , and a frequen-cy threshold   , pattern mining amounts to extract-ing the family F    of the patterns present in at least   records. Two languages (the pattern language   and the data language  ) and two binary relationsunderlie any instance of the  frequent pattern min-ing problem: the generality  between patterns    and the instantiation between a data record and apattern  . Generality follows instantiation: givena pattern  f     and a super-pattern thereof (  f     ), each record d    instantiating  f  ( d    f  )instantiates as well.In the simplest settings, both data records andpatterns are sets of items ( itemsets ) — that is,  =   =2 O — whereas    and  boil down to set-theo-retic inclusion. Hence, the mining goal amounts tofinding all the frequent subsets of a family of sets D   .Researchers have studied more elaboraterecord structures, 4 including sequences (  =   =   O ) and graphs 5 with equally more complex gen-erality and instantiation (subsequence and sub-graph relations, respectively). A somewhat orthog-onal research axis looks for generalized patternlanguages 6 built on top of a set of concepts C  abstracting objects from O (that is,   derived from2 C  instead of 2 O ). Concepts form a taxonomy, H  =  C  ,   , where  is the is-a relationship (for exam-ple, the hierarchical catalogue of products soldthrough an e-commerce Web site). A taxonomy is,in fact, a simple domain ontology.Pattern-mining methods search for F    through the pattern space   ,     by exploringthe monotony in frequency with regard to    . Apriori is the prototypical pattern miner thatperforms a level-wise top-down traversal of   ,     . 7 On itemsets, it examines patterns at level k — that is, of size k — on two points: the methodcomputes frequency of candidate patterns by matching them against the records in D , where-as it generates k + 1 candidates by combiningpairs of frequent k -patterns. Because scanningthe entire database is expensive, Apriori exploits frequency scores for a priori invalidationof infrequent candidates. Independent studieshave successfully adapted the level-wise miningapproach to both structured and generalized pat-terns, but our work is the first attempt we knowof to port it to ontology-based patterns. Pattern-Based Recommendation Patterns support the determination of the itemssuggested as the next step in a user session. Morespecifically, the recommender system matchesthe members of F    to the sequence g of already accessed objects to isolate the applicablepatterns. The system further analyzes the select-ed patterns to yield plausible continuations of the session.Pattern-based recommenders are either item-or concept-based, relying on individual and gen-eralized patterns, respectively. 8 The former yield very topical recommendations because they sug-gest all yet-unvisited objects strongly associatedwith g but experiences difficulties when usersaccess “rare” or new objects. The latter systemswork on different granularity levels and henceoffer more flexible recommendations. Indeed,instead of objects from g , their concepts arematched against F    in the search for associations,so the suggestion encompasses the extensions of all associated concepts. Such a method limits theimpact of specific object rareness but tends to missthe existing strong object associations.  f  f  f  46 www.computer.org/internet/ IEEE INTERNET COMPUTING Data Mining   Feeding a Full-ScaleOntology into the Process To achieve a better trade-off between recommen-dation flexibility and precision, our approach feedsthe mining process with knowledge about seman-tic links between objects. Our basic assumption isthat co-occurrences between objects often reflectthe existence of a link between them (clicks on aParis page, for example, are usually followed by clicks on pages of Parisian hotels). Hence, manip-ulating links explicitly can increase the focus of concept-based recommendation while preservingits flexibility.Because ontologies constitute the standard way of representing domain concepts and relations, weassume that an ontology powers the target recom-mendation system. In our settings, the ontology has two is-a taxonomies — concepts and relations— whose elements are connected by an attribution relation. Figure 1 provides a partial view of the Travel  ontology (see http://protege.cim3.net/file/pub/ontologies/travel/travel.owl). Formally, anontology is a four-tuple,  =  C  , R  ,   ,    , where C  is the set of concepts, or classes , and R  is the setof domain relations, or  properties . Figure 1 depictsthese as two-section rectangles and ovals, respec-tively (for example, LuxuryHotel  is a class and hasAccommodation is a property). The generality order   holds both for concepts and properties(     ( C     C     R     R  )); plain arrows depict thegenerality links (such as LuxuryHotel      Accom-modation and hasLuxuryHotel     hasAccommoda-tion ). Attribution is a ternary relation      C     R    C  , which connects a property r  to classes c  and, whose instances are, respectively, srcin anddestination of the links of type r  . In Figure 1, attri-bution is expressed through domain - and range  -labeled dashed arrows pointed to a property. Wedefine the precedence relation on top of  asa transitive reduction of   (for example, Luxury-Hotel Hotel  and hasLuxuryHotel hasHotel  ). We assume that descriptions of class and prop-erty instances are organized into a separate knowl-edge base. Formally, let K  =  O ,  ,   O  be such a ≺ Ω ≺ Ω ≺ Ω c  JULY • AUGUST200747 Ontology-Powered Web-Usage Mining  Figure 1.Partial view of the Travel ontology.The rectangular nodes represent classes (domain concepts),and the shapeswith rounded angles represent concept properties (roles).Dashed arrows represent domain and range links betweenconcepts and properties,and simple arrows represent the binary relationship is-a between both concepts and properties. hasActivityCityRa ng eRa ng eRa ng eRa ng eRa ng eRa ng eRa ng eRa ng eAcco mm odatio n BedA n dBreak  f  asthasL u x u ryHotelCapitalTow n Co un tryFar m la n dNatio n alPark  R u ralArea Acco mm odatio n Rati ng Urba n Area L    u x  u r   y H o t   el    H o t   el     Sa f  ariRa ng eDo m ai n Do m ai n Do m ai n Do m ai n Do m ai n Do m ai n Do m ai n  Do m ai n Do m ai n Do m ai n h   a  s A  c  c  o mm  od   a  t  i    o n hasAdve n t u re h   a  s  S  i      g  h   t   s  e ei    n  g  Adve n t u rehasM u se um h   a  s  S    p or  t   Desti n atio n Co n tactBeachS u r f  Ra ng eSports A  c  t  i    v i    t    y  hasS u r f  M  u  s  e  um Si g htseei ng hasHotelhasSa f  ari  base, where O stands for objects and  for theinstantiation relation; Table 1 provides a set of  links. In Figure 2, single-section rectangles repre-sent objects, whereas instantiation remains implic-it (for example, Paris instantiates Capital  ). Further-more,  O expresses interobject links (  O   O  R   O ) in a way similar to  (for example, hasLuxury-Hotel  links between Paris and Ritz  ); Table 2 pro- vides a set of attribution links. Mining Ontology-Based Patterns The knowledge encoded in  and K  induces acharacteristic pattern space with a specific min-ing discipline. Descriptive Languages and Pattern Space Our data language   is derived from sequencesof page URIs translated into object IDs from K  .Table 3 gives the set of object sequences compris-ing our running example.Each sequence is further extended with all thelinks from  O exclusively involving objects fromthe sequence. The resulting structures are labeleddigraphs, in which vertices are objects and edgeseither represent semantic links or sequential order.Formally,   is made of pairs s = (   ,   ), where      O is an object sequence and is a set of links. For conciseness, we simplify the triplets ineach s .   from ( o d  , r  , o r  ) to r  ( l  , m ), where l  and m arenatural numbers denoting the ranks of o d  and o r  ,respectively, within s .   . Table 4 illustrates the   completions of the sequences in Table 3. Current-ly, s = (   ,   ) requires that all links in s .   be colinear with the sequence s .   — that is, r  ( l  , m )   s .   , l    m .The language    has a structure identical tothat of   : a pattern S      is a pair of asequence and a triplet set, (   ,   ). However, here      C  and    2  follow the ontology rather thanthe knowledge base. We also impose the follow-ing constraints:1.no class appears at two neighbor positions in S  .   (  i  [2, | S  .   |], S  .   [ i ]   S  .   [ i – 1]), and2.no redundancy in links (  r  ( l  , m ), r   ( l  , m )  S  .   , r    r  ).Instantiation between data records and patternsrelies on graph morphism — that is, a mappingbetween two graphs that preserves edges. Intuitive-ly, s    instantiates S      , denoted ,whenever a substructure of s exists that could behomomorphically mapped onto S  . Formally,if a partial integer map  :[1, | s .   |]  [1,| S  .   |] exists that satisfies these constraints:•   is an order-preserving and surjective function, s  Γ  Ω S  s  Γ  Ω S   Ω θ   ρ  ∈ 2  O 48 www.computer.org/internet/ IEEE INTERNET COMPUTING Data Mining  Figure 2.Sample sequences of ontology entities.On top,we see asimple sequence together with its extended version (obtained by incorporating all existing links from the knowledge base).Below,theupper pattern is more specific than the lower one. Ritz s ´ 1 Capital sS ´Do m ai n Do m ai n Do m ai n Do m ai n Ra ng eRa ng eRa ng eRa ng eRa ng eFra n ce Gre n oble Paris Lo u vreExte n tio n  o f   do m ai n RitzFra n ce Gre n oble Paris Lo u vrehasL u x u ryHotelhasM u se um hasL u x u ryHotelhasM u se um M u se um L u x u ryHotelGe n eralizatio n  o f  Desti n atio n  Acco mm odatio n hasAcco mm odatio n Table 1.Partial extensions for some Travel classes. Class Instances Country France, Australia, ItalyCapital Paris, RomaCity Grenoble, Cairns, SydneyMuseum Louvre, Castel_Sant’AngeloLuxuryHotel Ritz, Grand_Hotel_Plaza, SwissotelBeach Clifton_Beach, Bondi_BeachSafari Cape_York_SafariSurf Bondi_Beach_Surf   •   i   [1, | s .   |], if  ( i ) is defined, then s .   (  ( i ))  S  .   ( i ), and•   r  ( l  , m )   S  .   , r   ( l   , m  )   s .   , such that r     and  ( l   ) = l  ,  ( m  ) = m .The generality order on    follows ,so a pattern S  1 is a superpattern of another one, S  2 , if every data sequence that instantiates S  2 instantiates S  1 . Given the Travel  ontology and theknowledge base composed of Tables 1 and 2, for example, we could check that each sequenceinstantiating S  from Figure 2 also instantiates S   .However, the inverse isn’t true, so S  is strictly more general than S  .Because instantiation-based generality tests areunpractical, we can use an alternative computa-tion mechanism that mimics the homomorphicmapping in instantiation tests. The only differencewith instantiation resides in the use of interclassspecialization instead of object-to-class instantia-tion links. In the example from Figure 2, the func-tion  for S  and S   is {(1, 1), (2, 2)}. Moreover, wecan define a level in the new space as the set of patterns that have the same generality degreereflecting the generality degree of pattern compo-nents, classes, and properties within the ontology.Thus, we introduce the generality rank meas-ure ( rank :       ), which is based on the depthof classes/properties ( h : C   R    ) — that is, thelength of a maximal path from a component to aroot of its respective is-a hierarchy (for example, h ( LuxuryHotel  ) = 3, as in Figure 1). A pattern’s rank is the sum of its component depths (in Figure 2, rank ( S   ) = 3 and rank ( S  ) = 16). The level k is thusmade of all patterns of rank k . Level-wise descent relies on the generation of k + 1-level patterns from k -level ones, whichamounts to the computation of the precedenceon    (transitive reduction of ). The tar-get operations clearly increase the pattern rank by 1, and they’re easily translated in terms of compo-nent precedence. The list includes adding a newclass/property tuple and specializing an existingclass in the pattern to a subclass or replacing aproperty by one of its subproperties. For instance,we can get S  16 from S  9 by specializing the Desti-nation class to UrbanArea . Although these operations constitute an effec-tive generation mechanism, their unrestricted useis redundancy-prone and hence would harm theefficiency of the global mining process. As anillustration, take S  16 in Table 7, which can beobtained from both S  9 and S  8 (given in Table 6) by class substitution and insertion, respectively. Although it might not be possible to completely eliminate this kind of redundancy, we can greatly reduce it by setting the position where operationsapply. Thus, only canonical operations — that is,those involving the last class of S  .   — are allowed.In other terms, classes can be appended only at theend of S  .   and substituted to the last class. Simi-larly, triplets r  ( l  , m ) involved in insertion/substi-tution satisfy m = | S  .   |. In the new settings, S  16 cannot be canonically generated from S  9 .Canonical operations, when chained, result instronger forms of pattern generalities — called pre-fix relations — that support recommendation. Moreprecisely, S   is a prefix of S   whenever S   is deriv-able from S   by a series of canonical operations. Mining Method The pattern-mining problem now looks like this:given  , K  , and D    ,   , find F         . As a first step toward a resolution, we proposethe xPMiner method, which combines the controlstructure of Apriori with the use of canonicaloperations. More specifically, xPMiner (see Algo-rithm 1) performs a level-wise descent in    ,  .  Γ  Ω  Γ  Ω  Γ  Ω  Γ  Ω  Γ  Ω JULY • AUGUST200749 Ontology-Powered Web-Usage Mining  Table 2.A subset of interobject links. PropertyObject pairs hasMuseum (Paris, Louvre), (Roma, Castel_Sant’Angelo)hasLuxuryHotel (Paris, Ritz), (Roma, Grand_Hotel_Plaza), (Sydney, Swissotel)hasSafari (Cairns, Cape_York_Safari)hasSurf (Sydney, Bondi_Beach_Surf) Table 3.Initial object sequences. IDObject sequences (   ) s  1   France, Grenoble, Paris, Louvre, Ritz  s  2   Australia, Cairns, Clifton_Beach, Cape_York_Safari  s  3   Italy, Roma, Castel_Sant’Angelo, Grand_Hotel_Plaza  Table 4.Relational completions of object sequences. IDRelation triplet sets (   ) s  1  {hasLuxuryHotel(3,5), hasMuseum(3,4)} s  2  {hasSafari(2,4)} s  3  {hasLuxuryHotel(2,4), hasMuseum(2,3)}
Recommended
View more...
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks