新书推介:《语义网技术体系》
作者:瞿裕忠,胡伟,程龚
   XML论坛     W3CHINA.ORG讨论区     计算机科学论坛     SOAChina论坛     Blog     开放翻译计划     新浪微博  
 
  • 首页
  • 登录
  • 注册
  • 软件下载
  • 资料下载
  • 核心成员
  • 帮助
  •   Add to Google

    >> 搜索引擎, 信息分类与检索, 语义搜索, Lucene, Nutch, GRUB, Larbin, Weka
    [返回] 中文XML论坛 - 专业的XML技术讨论区计算机技术与应用『 Web挖掘技术 』 → id3算法的matlab实现[转帖] 查看新帖用户列表

      发表一个新主题  发表一个新投票  回复主题  (订阅本版) 您是本帖的第 11775 个阅读者浏览上一篇主题  刷新本主题   树形显示贴子 浏览下一篇主题
     * 贴子主题: id3算法的matlab实现[转帖] 举报  打印  推荐  IE收藏夹 
       本主题类别: 数据挖掘    
     DMman 帅哥哟,离线,有人找我吗?魔羯座1984-1-11
      
      
      威望:1
      头衔:数据挖掘青年
      等级:研二(Pi-Calculus看得一头雾水)(版主)
      文章:803
      积分:5806
      门派:W3CHINA.ORG
      注册:2007/4/9

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给DMman发送一个短消息 把DMman加入好友 查看DMman的个人资料 搜索DMman在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给DMman 访问DMman的主页 引用回复这个贴子 回复这个贴子 查看DMman的博客楼主
    发贴心情 id3算法的matlab实现[转帖]

    function D = ID3(train_features, train_targets, params, region)

    % Classify using Quinlan´s ID3 algorithm
    % Inputs:
    % features - Train features
    % targets     - Train targets
    % params - [Number of bins for the data, Percentage of incorrectly assigned samples at a node]
    % region     - Decision region vector: [-x x -y y number_of_points]
    %
    % Outputs
    % D - Decision sufrace

    [Ni, M]    = size(train_features);

    %Get parameters
    [Nbins, inc_node] = process_params(params);
    inc_node    = inc_node*M/100;

    %For the decision region
    N           = region(5);
    mx          = ones(N,1) * linspace (region(1),region(2),N);
    my          = linspace (region(3),region(4),N)´ * ones(1,N);
    flatxy      = [mx(:), my(:)]´;

    %Preprocessing
    [f, t, UW, m]      = PCA(train_features, train_targets, Ni, region);
    train_features  = UW * (train_features - m*ones(1,M));;
    flatxy          = UW * (flatxy - m*ones(1,N^2));;

    %First, bin the data and the decision region data
    [H, binned_features]= high_histogram(train_features, Nbins, region);
    [H, binned_xy]      = high_histogram(flatxy, Nbins, region);

    %Build the tree recursively
    disp(´Building tree´)
    tree        = make_tree(binned_features, train_targets, inc_node, Nbins);

    %Make the decision region according to the tree
    disp(´Building decision surface using the tree´)
    targets = use_tree(binned_xy, 1:N^2, tree, Nbins, unique(train_targets));

    D = reshape(targets,N,N);
    %END

    function targets = use_tree(features, indices, tree, Nbins, Uc)
    %Classify recursively using a tree

    targets = zeros(1, size(features,2));

    if (size(features,1) == 1),
        %Only one dimension left, so work on it
        for i = 1:Nbins,
            in = indices(find(features(indices) == i));
            if ~isempty(in),
                if isfinite(tree.child(i)),
                    targets(in) = tree.child(i);
                else
                    %No data was found in the training set for this bin, so choose it randomally
                    n           = 1 + floor(rand(1)*length(Uc));
                    targets(in) = Uc(n);
                end
            end
        end
        break
    end
            
    %This is not the last level of the tree, so:
    %First, find the dimension we are to work on
    dim = tree.split_dim;
    dims= find(~ismember(1:size(features,1), dim));

    %And classify according to it
    for i = 1:Nbins,
        in      = indices(find(features(dim, indices) == i));
        targets = targets + use_tree(features(dims, :), in, tree.child(i), Nbins, Uc);
    end
        
    %END use_tree

    function tree = make_tree(features, targets, inc_node, Nbins)
    %Build a tree recursively

    [Ni, L]     = size(features);
    Uc          = unique(targets);

    %When to stop: If the dimension is one or the number of examples is small
    if ((Ni == 1) | (inc_node > L)),
        %Compute the children non-recursively
        for i = 1:Nbins,
            tree.split_dim  = 0;
            indices         = find(features == i);
            if ~isempty(indices),
                if (length(unique(targets(indices))) == 1),
                    tree.child(i) = targets(indices(1));
                else
                    H               = hist(targets(indices), Uc);
                    [m, T]          = max(H);
                    tree.child(i)   = Uc(T);
                end
            else
                tree.child(i)   = inf;
            end
        end
        break
    end

    %Compute the node´s I
    for i = 1:Ni,
        Pnode(i) = length(find(targets == Uc(i))) / L;
    end
    Inode = -sum(Pnode.*log(Pnode)/log(2));

    %For each dimension, compute the gain ratio impurity
    delta_Ib    = zeros(1, Ni);
    P           = zeros(length(Uc), Nbins);
    for i = 1:Ni,
        for j = 1:length(Uc),
            for k = 1:Nbins,
                indices = find((targets == Uc(j)) & (features(i,:) == k));
                P(j,k)  = length(indices);
            end
        end
        Pk          = sum(P);
        P           = P/L;
        Pk          = Pk/sum(Pk);
        info        = sum(-P.*log(eps+P)/log(2));
        delta_Ib(i) = (Inode-sum(Pk.*info))/-sum(Pk.*log(eps+Pk)/log(2));
    end

    %Find the dimension minimizing delta_Ib
    [m, dim] = max(delta_Ib);

    %Split along the ´dim´ dimension
    tree.split_dim = dim;
    dims           = find(~ismember(1:Ni, dim));
    for i = 1:Nbins,
        indices       = find(features(dim, :) == i);
        tree.child(i) = make_tree(features(dims, indices), targets(indices), inc_node, Nbins);
    end


       收藏   分享  
    顶(0)
      




    ----------------------------------------------
    数据挖掘青年 http://blogger.org.cn/blog/blog.asp?name=DMman
    纪录片之家 (很多纪录片下载)http://www.jlpzj.com/?fromuid=137653

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/7/20 23:45:00
     
     ws19841019 帅哥哟,离线,有人找我吗?
      
      
      等级:大一新生
      文章:1
      积分:61
      门派:XML.ORG.CN
      注册:2007/10/15

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给ws19841019发送一个短消息 把ws19841019加入好友 查看ws19841019的个人资料 搜索ws19841019在『 Web挖掘技术 』 的所有贴子 引用回复这个贴子 回复这个贴子 查看ws19841019的博客2
    发贴心情 
    有错误!!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/10/15 21:10:00
     
     ws19841019 帅哥哟,离线,有人找我吗?
      
      
      等级:大一新生
      文章:1
      积分:61
      门派:XML.ORG.CN
      注册:2007/10/15

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给ws19841019发送一个短消息 把ws19841019加入好友 查看ws19841019的个人资料 搜索ws19841019在『 Web挖掘技术 』 的所有贴子 引用回复这个贴子 回复这个贴子 查看ws19841019的博客3
    发贴心情 
    有错误!!
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/10/15 21:11:00
     
     liqing_513box 美女呀,离线,快来找我吧!
      
      
      等级:大一新生
      文章:0
      积分:54
      门派:XML.ORG.CN
      注册:2007/10/17

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给liqing_513box发送一个短消息 把liqing_513box加入好友 查看liqing_513box的个人资料 搜索liqing_513box在『 Web挖掘技术 』 的所有贴子 引用回复这个贴子 回复这个贴子 查看liqing_513box的博客4
    发贴心情 
    怎么在MATLAB中输入函数参数,楼主能不能把在MATLAB中怎样运行详细说一下
    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/10/17 21:09:00
     
     DMman 帅哥哟,离线,有人找我吗?魔羯座1984-1-11
      
      
      威望:1
      头衔:数据挖掘青年
      等级:研二(Pi-Calculus看得一头雾水)(版主)
      文章:803
      积分:5806
      门派:W3CHINA.ORG
      注册:2007/4/9

    姓名:(无权查看)
    城市:(无权查看)
    院校:(无权查看)
    给DMman发送一个短消息 把DMman加入好友 查看DMman的个人资料 搜索DMman在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给DMman 访问DMman的主页 引用回复这个贴子 回复这个贴子 查看DMman的博客5
    发贴心情 
    不好意思 我是转的。各位朋友请自己研究

    ----------------------------------------------
    数据挖掘青年 http://blogger.org.cn/blog/blog.asp?name=DMman
    纪录片之家 (很多纪录片下载)http://www.jlpzj.com/?fromuid=137653

    点击查看用户来源及管理<br>发贴IP:*.*.*.* 2007/10/18 8:10:00
     
     GoogleAdSense魔羯座1984-1-11
      
      
      等级:大一新生
      文章:1
      积分:50
      门派:无门无派
      院校:未填写
      注册:2007-01-01
    给Google AdSense发送一个短消息 把Google AdSense加入好友 查看Google AdSense的个人资料 搜索Google AdSense在『 Web挖掘技术 』 的所有贴子 点击这里发送电邮给Google AdSense 访问Google AdSense的主页 引用回复这个贴子 回复这个贴子 查看Google AdSense的博客广告
    2024/5/18 2:42:50

    本主题贴数5,分页: [1]

    管理选项修改tag | 锁定 | 解锁 | 提升 | 删除 | 移动 | 固顶 | 总固顶 | 奖励 | 惩罚 | 发布公告
    W3C Contributing Supporter! W 3 C h i n a ( since 2003 ) 旗 下 站 点
    苏ICP备05006046号《全国人大常委会关于维护互联网安全的决定》《计算机信息网络国际联网安全保护管理办法》
    78.125ms