Category: Big Data

  • Applied Machine Learning – things you need to know

    Một số lưu ý khi áp dụng Machine Learning để giải quyết các vấn đề cụ thể: Always use train_test_split or similar GridSearchCV (built-in cross validation) HDF need to be shrunk after write/update –> ptrepack –chunkshape=auto –propindexes –complevel=9 –complib=blosc data_in.h5 data_out.h5 Use Keras optimizer instead of tensorflow itself (so that it can be saved later […]

  • How to get started with Hadoop – Hadoop căn bản

    1 of the most painful jobs of a system engineer is to build a whole system by installing multiple packages, one-by-one. We all worry about incompatibility and dependencies With Hadoop, you can do that with big help from HDP (Hortonworks Data Platform) Great tutorials and documentation can be found here The order of methods you […]

  • Big Data references

    Some useful websites & courses to learn Big Data: [1] [2] [3] [4] [5] [6] [7]   Updating …

  • Masternotdiscoveredexception elasticsearch

    Sometimes, when you want to join a node to  elasticsearch cluster, this problem may occur (the reason may vary, but I think there are some limitations of using multicast here) Solution: Uncomment those lines in elasticsearch.yml We tell this host (node) to use unicast discovery instead of multicast, and then specify the master host manually […]

  • About the Chukwa released versions

    I’m working with some log collection & aggregation tools from Apache Project, when  it came to Chukwa – I read the introduction, release note of the project and didn’t know what to do because it seemed like Chukwa had been in and out for a while and a bit obsolete. So I decided to email the […]

  • Hadoop 2.2 and Flume 1.4 Protobuf Problem and Solution

    I have to say the big THANK to the author of  “Hadoop in Practice” : Alex Holmes Source : The problem you may encounter while  trying to integrate Hadoop 2.2 and Flume 1.4 is the incompatibility between protobuf versions : 2014-04-15 13:56:23,251 (SinkRunner-PollingRunner-DefaultSinkProcessor) [ERROR – org.apache.flume.sink.hdfs.HDFSEventSink.process(] process failed java.lang.VerifyError: class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$RecoverLeaseRequestProto overrides final method getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet; […]