작성자: admin 작성일시: 2016-09-29 16:23:27 조회수: 115 다운로드: 16
카테고리: R 태그목록:

R 데이터 입출력

R에서 사용하는 데이터는 대부분 파일이나 데이터베이스에서 읽어서 사용한다. 여기에서는 이러한 데이터 입출력에 필요한 명령을 알아본다.

디렉토리 이동

R에서 파일을 읽으려면 우선 해당 파일이 있는 디렉토리로 옮겨가는 것이 편하다. R의 디렉토리나 파일 목록 관련 명령은 다음과 같다.

  • setwd: 현재 디렉토리 설정
  • getwd: 현재 디렉토리 출력
  • dir: 현재 디렉토리의 파일 목록 출력
  • readLines: 파일 내용 읽기
In [6]:
setwd("~/data/sheather")
In [7]:
getwd()
'/home/dockeruser/data/sheather'
In [24]:
cat(dir(), sep="\n")
AdRevenue.csv
airfares.txt
ais.txt
anscombe.txt
banknote.txt
BayArea.txt
bimodal.txt
bonds2.txt
bonds.txt
bookstore.txt
Bordeaux.csv
boxoffice.txt
bridge.txt
CarlsenQ.txt
cars04.csv
caution.csv
Ch2Problem5.txt
challenge.txt
changeover_times.txt
circulation.txt
cleaning.txt
cleaningwtd.txt
confood1.txt
confood2.txt
creditscore.txt
curve.txt
defects.txt
diamonds.txt
FieldGoals2003to2006.csv
glakes.txt
GreatestGivers.xls
Haldcement.txt
HeartDisease.csv
HoustonChronicle.csv
HoustonRealEstate.txt
huber.txt
indicators.txt
invoices.txt
krafft.txt
Latour.txt
magazines.csv
Mantel.txt
MichelinFood.txt
MichelinNY.csv
MissAmericato2008.txt
nonlinearx.txt
nyc.csv
Orthodont.txt
overdue.txt
pgatour2006.csv
pigweights.csv
playbill.csv
playoffs.txt
production.txt
ProfessorSalaries.txt
profsalary.txt
prostateAlldata.txt
prostateTest.txt
prostateTraining.txt
responsetransformation.txt
resptrans.txt
salarygov.txt
sleepstudy.txt
storks.txt
travel.txt
In [14]:
cat(readLines("bonds.txt"), sep="\n")
Case	CouponRate	BidPrice
1	7	92.94
2	9	101.44
3	7	92.66
4	4.125	94.50
5	13.125	118.94
6	8	96.75
7	8.75	100.88
8	12.625	117.25
9	9.5	103.34
10	10.125	106.25
11	11.625	113.19
12	8.625	99.44
13	3	94.50
14	10.5	108.31
15	11.25	111.69
16	8.375	98.09
17	10.375	107.91
18	11.25	111.97
19	12.625	119.06
20	8.875	100.38
21	10.5	108.5
22	8.625	99.25
23	9.5	103.63
24	11.5	114.03
25	8.875	100.38
26	7.375	92.06
27	7.25	90.88
28	8.625	98.41
29	8.5	97.75
30	8.875	99.88
31	8.125	95.16
32	9	100.66
33	9.25	102.31
34	7	88
35	3.5	94.53

파일에서 데이터 읽기

read.table 명령은 파일에서 데이타를 읽어 데이터프레임 형태로 만들어준다.

read.table(file, header, sep)

  • file : 데이터 파일 패스
  • header=FALSE : TRUE이면 첫 행은 헤더라인
  • sep="" : 디폴트 원소 구분자는 공백 (space)
In [21]:
df1 <- read.table("bonds.txt", header=TRUE)
df1
CaseCouponRateBidPrice
1 7.000 92.94
2 9.000101.44
3 7.000 92.66
4 4.125 94.50
5 13.125118.94
6 8.000 96.75
7 8.750100.88
8 12.625117.25
9 9.500103.34
10 10.125106.25
11 11.625113.19
12 8.625 99.44
13 3.000 94.50
14 10.500108.31
15 11.250111.69
16 8.375 98.09
17 10.375107.91
18 11.250111.97
19 12.625119.06
20 8.875100.38
21 10.500108.50
22 8.625 99.25
23 9.500103.63
24 11.500114.03
25 8.875100.38
26 7.375 92.06
27 7.250 90.88
28 8.625 98.41
29 8.500 97.75
30 8.875 99.88
31 8.125 95.16
32 9.000100.66
33 9.250102.31
34 7.000 88.00
35 3.500 94.53

csv 파일의 경우에는 read.csv 명령을 사용한다.

In [22]:
df2 <- read.csv("Bordeaux.csv")
df2
WinePriceParkerPointsCoatesPointsP95andAboveFirstGrowthCultWinePomerolVintageSuperstar
Lafite 2850 100 19.5 1 1 0 0 0
Latour 2850 98 18.5 1 1 0 0 0
Margaux 2900 100 19.5 1 1 0 0 0
Mouton 2500 97 17.0 1 1 0 0 0
Haut Brion 2500 98 18.5 1 1 0 0 0
Cheval Blanc 3650 100 19.5 1 1 0 0 0
Ausone 4200 100 18.5 1 1 0 0 1
Petrus 10500 100 18.5 1 1 1 0 0
Pichon-Lalande 880 97 16.5 1 0 0 0 0
Pichon-Baron 550 96 17.5 1 0 0 0 0
Duhart-Milon 210 90 16.0 0 0 0 0 0
Batailley 150 87 15.5 0 0 0 0 0
Haut-Batailley 180 90 16.5 0 0 0 0 0
Grand-Puy-Lacoste 380 92 18.0 0 0 0 0 0
Lynch-Bages 620 95 16.0 1 0 0 0 0
Pontet-Canet 330 92 16.5 0 0 0 0 0
D'Armailhac 210 91 15.5 0 0 0 0 0
Clerc-Millon 225 91 16.0 0 0 0 0 0
Leoville-Las-Cases 1300 100 18.5 1 0 0 0 0
Leoville-Poyferre 465 95 17.5 1 0 0 0 0
Leoville-Barton 780 96 18.5 1 0 0 0 0
Gruaud-Larose 520 94 17.0 0 0 0 0 0
Ducru-Beaucaillou 680 94 18.5 0 0 0 0 0
Lagrange 260 93 15.0 0 0 0 0 0
Langoa-Barton 240 91 17.0 0 0 0 0 0
Saint-Pierre 180 89 16.5 0 0 0 0 0
Talbot 330 90 17.0 0 0 0 0 0
Beychevelle 240 91 16.5 0 0 0 0 0
Rauzan-Segla 420 90 17.5 0 0 0 0 0
Durfort-Vivens 180 88 17.0 0 0 0 0 0
La Tour Haut-Brion 310 92 17.0 0 0 0 0 0
Angelus 980 96 18.0 1 1 0 0 0
Beau-Sejour-Becot 380 93 17.0 0 0 0 0 0
Beausejour 450 92 17.5 0 0 0 0 0
Belair 250 87 16.5 0 0 0 0 0
Canon 360 89 18.0 0 0 0 0 0
Clos Fourtet 325 90 15.0 0 0 0 0 0
Figeac 520 93 18.0 0 0 0 0 0
La Gaffeliere 280 90 15.5 0 0 0 0 0
Magdelaine 350 92 18.0 0 0 0 0 0
Pavie 1600 100 14.5 1 1 0 0 0
Trottevieille 250 89 15.0 0 0 0 0 0
La Mondotte 2400 98 18.0 1 0 1 0 0
Troplong-Mondot 450 96 17.5 1 0 0 0 0
Pavie-Macquin 520 95 17.5 1 0 0 0 0
Tertre-Roteboeuf 1300 96 17.5 1 0 0 0 0
De Valandraud 1620 93 16.5 0 0 1 0 0
Trotanoy 800 92 18.5 0 0 0 1 0
La Fleur-Petrus 500 95 18.0 1 0 0 1 0
Latour-a-Pomerol 350 91 17.5 0 0 0 1 0
Vieux. Ch. Certan 840 94 18.0 0 0 0 1 0
Certran de May 550 91 16.0 0 0 0 1 0
La Conseillante 1250 96 17.5 1 0 0 1 0
L'Evangile 1500 96 18.0 1 0 0 1 0
Le Pin 10500 98 17.5 1 0 1 1 0
Lafleur 5000 100 18.0 1 0 1 1 0
Gazin 300 90 16.0 0 0 0 1 0
Clinet 700 92 15.5 0 0 0 1 0
L'Eglise-Clinet 1400 96 18.0 1 0 0 1 0
Clos L'Eglise 1220 96 17.0 1 0 0 1 0

파일로 데이터 쓰기

반대로 데이터프레임 변수를 파일로 쓰려면 write.table 명령이나 write.csv 명령을 사용한다.

In [23]:
write.table(df1, "bonds2.txt")
In [27]:
cat(dir(), sep="\n")
AdRevenue.csv
airfares.txt
ais.txt
anscombe.txt
banknote.txt
BayArea.txt
bimodal.txt
bonds2.txt
bonds.txt
bookstore.txt
Bordeaux.csv
boxoffice.txt
bridge.txt
CarlsenQ.txt
cars04.csv
caution.csv
Ch2Problem5.txt
challenge.txt
changeover_times.txt
circulation.txt
cleaning.txt
cleaningwtd.txt
confood1.txt
confood2.txt
creditscore.txt
curve.txt
defects.txt
diamonds.txt
FieldGoals2003to2006.csv
glakes.txt
GreatestGivers.xls
Haldcement.txt
HeartDisease.csv
HoustonChronicle.csv
HoustonRealEstate.txt
huber.txt
indicators.txt
invoices.txt
krafft.txt
Latour.txt
magazines.csv
Mantel.txt
MichelinFood.txt
MichelinNY.csv
MissAmericato2008.txt
nonlinearx.txt
nyc.csv
Orthodont.txt
overdue.txt
pgatour2006.csv
pigweights.csv
playbill.csv
playoffs.txt
production.txt
ProfessorSalaries.txt
profsalary.txt
prostateAlldata.txt
prostateTest.txt
prostateTraining.txt
responsetransformation.txt
resptrans.txt
salarygov.txt
sleepstudy.txt
storks.txt
travel.txt

질문/덧글

아직 질문이나 덧글이 없습니다. 첫번째 글을 남겨주세요!