Load a CSV file in 1 Minute

In this tutorial, you will learn how to load 1 million rows of csv file into the Machbase. Machbase provides csvimort which is a convenient utility to load a csv file.

Step 1 : Install and Start Machbase

You can install and start Machbase with a docker image which is provded by Machbase. Please refer to the tutorial, Install Machbase in 1 minute for details.

Step 2 : Verify the csv file to load

Machbase provides a sample csv file in $MACHBASE_HOME/tutorials/csvload directory. The part of the sample csv file is shown in below.

$ cd $MACHBASE_HOME/tutorials/csvload

$ more sample_data.csv
2015-05-20 06:00:00,63.214.191.124,2296,122.195.164.32,5416,12,GET /twiki/bin/view/Main/TWikiGroups?rev=1.2 HTTP/1.1,
200,5162
2015-05-20 06:00:07,212.237.153.79,6203,71.129.68.118,8859,67,GET /twiki/bin/view/Main/WebChanges HTTP/1.1,200,40520
2015-05-20 06:00:07,243.9.49.80,344,122.195.164.32,6203,46,GET /twiki/bin/view/Main/TWikiGroups?rev=1.2 HTTP/1.1,200,
5162
2015-05-20 06:00:07,232.191.241.129,5377,174.47.129.59,1247,17,GET /mailman/listinfo/hsdivision HTTP/1.1,200,6291
2015-05-20 06:00:07,121.67.24.216,2296,212.237.153.79,6889,68,GET /twiki/bin/view/TWiki/WebTopicEditTemplate HTTP/1.1
,200,3732
2015-05-20 06:00:07,31.224.72.52,450,100.46.183.122,10541,20,GET /twiki/bin/view/Main/WebChanges HTTP/1.1,200,40520
2015-05-20 06:00:07,210.174.159.227,6180,173.149.119.202,6927,2,GET /twiki/bin/rdiff/TWiki/AlWilliams?rev1=1.2&rev2=1
.1 HTTP/1.1,200,5234
2015-05-20 06:00:07,210.174.159.227,10124,16.194.51.72,10512,69,GET /twiki/bin/rdiff/TWiki/AlWilliams?rev1=1.2&rev2=1
.1 HTTP/1.1,200,5234
2015-05-20 06:00:07,60.48.99.15,12333,85.183.139.166,12020,64,GET /robots.txt HTTP/1.1,200,68
...

Step 3 : Load 1M csv file

1) Insert datetime field into _arrival_time column.
As you can see, the first field of csv file is date time value. It can be stored in _arrival_time which is a hidden column and created automatically.
If data is stored in the '_arrival_time' column, you can use the time series syntax such as DURATION.
Please refer to the tutorial, time series query in 1 minute for more details.

First, you need to a create table. The table name will be SAMPLE_TABLE in this example and the script is in below.

CREATE TABLE SAMPLE_TABLE
(
    srcip       IPV4,
    srcport     INTEGER,
    dstip       IPV4,
    dstport     INTEGER,
    protocol    SHORT,
    eventlog    VARCHAR(1204),
    eventcode   SHORT,
    eventsize   LONG
 );
# go to csvload directory
$cd $MACHBASE_HOME/tutorials/csvload

# create SAMPLE_TABLE with machsql command.
$machsql -s localhost -u sys -p manager -f create_sample_table.sql

# load s sample_data.csv  into Machbase with _arrival_time and datetime format option.
$csvimport -t sample_table -d sample_data.csv -a -F "_arrival_time YYYY-MM-DD HH24:MI:SS"

Note
There is a shell script to execute commands that create table and load sample_data.csv. You can just run load_sample.sh
$ sh load_sample.sh

2) Insert datetime field in user-defined column.

You can insert a first field in sample_data.csv into not _arrival_time column but user-defined column. you need to add datetime column like below.

CREATE TABLE SAMPLE2_TABLE
(
    dt          DATETIME,
    srcip       IPV4,
    srcport     INTEGER,
    dstip       IPV4,
    dstport     INTEGER,
    protocol    SHORT,
    eventlog    VARCHAR(1204),
    eventcode   SHORT,
    eventsize   LONG
 );
# go to csvload directory
$cd $MACHBASE_HOME/tutorials/csvload

# create SAMPLE2_TABLE with machsql command.
$machsql -s localhost -u sys -p manager -f create_sample2_table.sql

# load s sample_data.csv  into Machbase with datetime format option.
$csvimport -t sample2_table -d sample_data.csv -F "dt YYYY-MM-DD HH24:MI:SS"

Note:
Since it is a shell script to execute the command that both creates a table and loads a sample_data.csv, you can just run the 'load_sample2.sh'.
$ sh load_sample2.sh

3) Create a table with header in csv file
In the above example, the data type of each field of the csv file is checked, a table is generated in advance, and a file is loaded. However, if there is a header in the csv file, you can load the data while generating the table using the header name. At this time, the data type of the generated table column is VARCHAR (32767).

Machbase provides 'awards.csv' file and here is the part of 'awards.csv'.

playerID,awardID,yearID,lgID,tie,notes
bondto01,Pitching Triple Crown,1877,NL,,
hinespa01,Triple Crown,1878,NL,,
heckegu01,Pitching Triple Crown,1884,AA,,
radboch01,Pitching Triple Crown,1884,NL,,
oneilti01,Triple Crown,1887,AA,,
keefeti01,Pitching Triple Crown,1888,NL,,
clarkjo01,Pitching Triple Crown,1889,NL,,
rusieam01,Pitching Triple Crown,1894,NL,,
duffyhu01,Triple Crown,1894,NL,,
youngcy01,Pitching Triple Crown,1901,AL,,
lajoina01,Triple Crown,1901,AL,,

It will create the AWARDS table with header in awards.csv and load csv.
-C option is for creating table and -H option is for setting column name with header.

$ cd $MACHBASE_HOME/tutorials/csvload

$ csvimport -t awards -d awards.csv -C -H

$ machsql -s localhost -u sys -p manager
=================================================================
     Machbase Client Query Utility
     Release Version 3.5.0.826b8f2
     Copyright 2014 Machbase Inc. or its subsidiaries.
     All Rights Reserved.
=================================================================
MACHBASE_CONNECT_MODE=INET, PORT=5656
Type 'help' to display a list of available commands.

Mach> SHOW TABLES

Mach> DESC AWARDS;

Mach> SELECT COUNT(*) FROM AWARDS;

Step 4 : Retrieve data from Machbase
Machbase provides machsql which is a interactive sql tool.


$ machsql
=================================================================
     Machbase Client Query Utility
     Release Version 3.5.0.826b8f2
     Copyright 2014 Machbase Inc. or its subsidiaries.
     All Rights Reserved.
=================================================================
Machbase server address (Default:127.0.0.1) :       <== press the enter key.
Machbase user ID  (Default:SYS)                     <== press the enter key.
Machbase User Password :                            <== type in 'manager'
MACH_CONNECT_MODE=INET, PORT=5656
Type 'help' to display a list of available commands.

Mach> SHOW TABLES

Mach> DESC SAMPLE_TABLE;

Mach> SELECT COUNT(*) FROM SAMPLE_TABLE;

Mach> SELECT SRCIP, COUNT(*) FROM SAMPLE_TABLE GROUP BY SRCIP ORDER BY 2 DESC LIMIT 10;

results matching ""

    No results matching ""