情感分析——数据集

    xiaoxiao2022-04-10  164

    1、斯坦福大学Stanford Sentiment Treebank成为一个标准数据集,又分为两个任务, 一个是二分,6920/872/1821;一个是五分(very negative, negative, neutral, positive, very positive)包含11855个句子和215154个短语的标记(5类)。8544个训练集,1101个验证集和2210个测试集。

    2、IMDB:包含10W个评论文本,25000个训练(2类),25000个测试,而且已经均衡处理过的,还有50000个未标注。

    3、Yelp:包含business、user、review、tip和checkin信息。

    4、Amazon评论

    5、SemEval

    SemEval2014:

        restaurant领域:3842个句子,3041个训练语句,800个测试句子;

        laptop领域:     3845个句子,3045个训练,800个测试。

        共pos, neg, neu, conflict四类。Tang在其文章中没考虑conflict。

    Dataset

    Pos.

    Neg.

    Neu.

    Laptop-Train

    Laptop-Test

    Restaurant-Train

    Restaurant-Test

    994

    341

    2164

    728

    870

    128

    807

    196

    464 2328

    169 638

    637 3608

    196 1120

    数据示例:

    <sentence id="2573"><text>Although we were looking for regular lettuce and some walnuts the salads we got were great.</text><aspectTerms><aspectTerm term="salads" polarity="positive" from="66" to="72"/><aspectTerm term="lettuce" polarity="neutral" from="37" to="44"/><aspectTerm term="walnuts" polarity="neutral" from="54" to="61"/></aspectTerms><aspectCategories><aspectCategory category="food" polarity="positive"/></aspectCategories></sentence>

    SemEval2015:

        restaurant领域:2000个训练语句(350 reviews), 48个验证句子(10reviews),676个测试句子(90 reviews);

        laptop领域:    2500个训练(450 reviews),55个验证(10 reviews), 808个测试(80 reviews)。

     共pos, neg, neu三类。

    rest数据示例:

    <span style="font-size:14px;"><Review rid="1028246"><sentences><sentence id="1028246:0"><text>Went on a 3 day oyster binge, with Fish bringing up the closing, and I am so glad this was the place it O trip ended, because it was so great!</text><Opinions><Opinion target="Fish" category="RESTAURANT#GENERAL" polarity="positive" from="35" to="39"/></Opinions></sentence><sentence id="1028246:1"><text>Service was devine, oysters where a sensual as they come, and the price can't be beat!!!</text><Opinions><Opinion target="Service" category="SERVICE#GENERAL" polarity="positive" from="0" to="7"/><Opinion target="oysters" category="FOOD#QUALITY" polarity="positive" from="20" to="27"/><Opinion target="NULL" category="RESTAURANT#PRICES" polarity="positive" from="0" to="0"/></Opinions></sentence><sentence id="1028246:2"><text>You can't go wrong here.</text><Opinions><Opinion target="NULL" category="RESTAURANT#GENERAL" polarity="positive" from="0" to="0"/></Opinions></sentence></sentences></Review></span>

    laptop数据示例(没有标记aspect term):

    <span style="font-size:14px;"><Review rid="10"><sentences><sentence id="10:0"><text>the laptop was really good and it goes really fast just the way i thought it would of run.</text><Opinions><Opinion category="LAPTOP#GENERAL" polarity="positive"/><Opinion category="LAPTOP#OPERATION_PERFORMANCE" polarity="positive"/></Opinions></sentence><sentence id="10:1"><text>i would really recommend to any person out there to get this laptop cause its really worth it.</text><Opinions><Opinion category="LAPTOP#GENERAL" polarity="positive"/></Opinions></sentence><sentence id="10:2"><text>and its really cheap and you wont regret buying it.</text><Opinions><Opinion category="LAPTOP#PRICE" polarity="positive"/><Opinion category="LAPTOP#GENERAL" polarity="positive"/></Opinions></sentence></sentences></Review></span> 6、Stanford Twitter Sentiment(STS):包含1.6M个推特(2类),作者随机选择了80K作为训练集,16K作为验证集,498个作为测试机。

    转载请注明原文地址: https://ju.6miu.com/read-1088751.html

    最新回复(0)