如果你还没看过Flume-ng源码解析系列中的启动流程、Channel组件和Sink组件,可以点击下面链接: Flume-ng源码解析之启动流程 Flume-ng源码解析之Channel组件 Flume-ng源码解析之Sink组件
在前面三篇文章中我们初步了解了Flume的启动流程、Channel组件和Sink组件,接下来我们一起来看看agent三大组件中Source组件。
Source,作为agent中的消息来源组件,我们来看看它是如何将event传递给channel的和它的特性。
依然先看代码:
@InterfaceAudience.Public @InterfaceStability.Stable public interface Source extends LifecycleAware, NamedComponent { public void setChannelProcessor(ChannelProcessor channelProcessor); public ChannelProcessor getChannelProcessor(); }我们可以看到它里面定义的两个需要实现方法是getChannelProcessor和setChannelProcessor,我们大概可以猜到,source就是通过ChannelProcessor将event传输给channel的。
这里先来了解一下Source的类型,Flume根据数据来源的特性将Source分成两类类,像Http、netcat和exec等就是属于事件驱动型(EventDrivenSource),而kafka和Jms等就是属于轮询拉取型(PollableSource)。
据我们在启动流程中了解到的,Application是先启动SourceRunner,再由SourceRunner来启动source,那么既然source有两种类型,那么Sourcerunner也分为EventDrivenSourceRunner和PollableSourceRunner,我们来看看它们的start():
EventDrivenSourceRunner
public class EventDrivenSourceRunner extends SourceRunner { … @Override public void start() { Source source = getSource(); ChannelProcessor cp = source.getChannelProcessor(); cp.initialize(); source.start(); lifecycleState = LifecycleState.START; } … }PollableSourceRunner
public class PollableSourceRunner extends SourceRunner { … @Override public void start() { PollableSource source = (PollableSource) getSource(); ChannelProcessor cp = source.getChannelProcessor(); cp.initialize(); source.start(); runner = new PollingRunner(); runner.source = source; runner.counterGroup = counterGroup; runner.shouldStop = shouldStop; runnerThread = new Thread(runner); runnerThread.setName(getClass().getSimpleName() + "-" + source.getClass().getSimpleName() + "-" + source.getName()); runnerThread.start(); lifecycleState = LifecycleState.START; } … public static class PollingRunner implements Runnable { private PollableSource source; private AtomicBoolean shouldStop; private CounterGroup counterGroup; @Override public void run() { logger.debug("Polling runner starting. Source:{}", source); while (!shouldStop.get()) { counterGroup.incrementAndGet("runner.polls"); try { if (source.process().equals(PollableSource.Status.BACKOFF)) { counterGroup.incrementAndGet("runner.backoffs"); Thread.sleep(Math.min( counterGroup.incrementAndGet("runner.backoffs.consecutive") * source.getBackOffSleepIncrement(), source.getMaxBackOffSleepInterval())); } else { counterGroup.set("runner.backoffs.consecutive", 0L); } } catch (InterruptedException e) { logger.info("Source runner interrupted. Exiting"); counterGroup.incrementAndGet("runner.interruptions"); } catch (EventDeliveryException e) { logger.error("Unable to deliver event. Exception follows.", e); counterGroup.incrementAndGet("runner.deliveryErrors"); } catch (Exception e) { counterGroup.incrementAndGet("runner.errors"); logger.error("Unhandled exception, logging and sleeping for " + source.getMaxBackOffSleepInterval() + "ms", e); try { Thread.sleep(source.getMaxBackOffSleepInterval()); } catch (InterruptedException ex) { Thread.currentThread().interrupt(); } } } logger.debug("Polling runner exiting. Metrics:{}", counterGroup); } } }无论是PollableSourceRunner还是EventDrivenSourceRunner,都是调用它里面的source的start()。这个时候我们看到ChannelProcessor的存在,那么就会有疑惑,这ChannelProcessor哪来的?我们还是得看回AbstarctConfigurationProvider,查看里面的loadSources(),我们就会发现下面这段代码:
ChannelSelectorConfiguration selectorConfig = config.getSelectorConfiguration(); ChannelSelector selector = ChannelSelectorFactory.create(sourceChannels, selectorConfig); ChannelProcessor channelProcessor = new ChannelProcessor(selector); Configurables.configure(channelProcessor, config); source.setChannelProcessor(channelProcessor);到这里我们基本已经了解了Source的启动流程,下面以AvroSource为例看看,source是在哪里调用ChannelProcessor的插入方法。
在append方法中我们可以看到getChannelProcessor().processEvent(event);,所以不同的Source根据它的不同触发机制和拉取机制,在特定的时候调用ChannelProcessor来执行event的插入。 ·
到此为止,我们就完成了对Flume启动流程和三大组件的研究,鉴于能力,其中有些细节没办法深入研究,希望以后有时间能够继续深入分析下去。