Wush Wu
國立台灣大學
[1] Qn1 Qn1 Qn1 Qn1 Qn1 Qn1
12 Levels: Qn1 < Qn2 < Qn3 < Qc1 < Qc3 < Qc2 < Mn3 < Mn2 < Mn1 < ... < Mc1
typedef unsigned int SEXPTYPE;
#define NILSXP 0 /* nil = NULL */
#define SYMSXP 1 /* symbols */
#define LISTSXP 2 /* lists of dotted pairs */
#define CLOSXP 3 /* closures */
#define ENVSXP 4 /* environments */
...
g <- lm(dist ~ speed, cars)
str(head(g))
List of 6
$ coefficients : Named num [1:2] -17.58 3.93
..- attr(*, "names")= chr [1:2] "(Intercept)" "speed"
$ residuals : Named num [1:50] 3.85 11.85 -5.95 12.05 2.12 ...
..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
$ effects : Named num [1:50] -303.914 145.552 -8.115 9.885 0.194 ...
..- attr(*, "names")= chr [1:50] "(Intercept)" "speed" "" "" ...
$ rank : int 2
$ fitted.values: Named num [1:50] -1.85 -1.85 9.95 9.95 13.88 ...
..- attr(*, "names")= chr [1:50] "1" "2" "3" "4" ...
$ assign : int [1:2] 0 1
c(T, F, TRUE, FALSE)
[1] TRUE FALSE TRUE FALSE
c(1L, 2L, 3L, 4L, 0xaL)
[1] 1 2 3 4 10
c(1.0, .1, 1e-2, 1e2, 1.2e2)
[1] 1.00 0.10 0.01 100.00 120.00
c("1", "a", "中文")
[1] "1" "a" "中文"
c("a\0b")
Error: nul character not allowed (line 1)
請同學完成以下的swirl課程,練習操作上述介紹的R 物件
RBasic-02-Data-Structure-Vectors
RBasic-03-Data-Structure-Object
factor
範例head(CO2$Type)
[1] Quebec Quebec Quebec Quebec Quebec Quebec
Levels: Quebec Mississippi
factor
的真相dput(CO2$Type)
structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("Quebec", "Mississippi"
), class = "factor")
factor
的真相attributes(CO2$Type)
$levels
[1] "Quebec" "Mississippi"
$class
[1] "factor"
dput
函數會輸出如.Label
這種標籤,但是並不是真正的屬性標籤以下內容擷取自structure
的說明文件:
Adding a class "factor" will ensure that numeric codes are given integer storage mode.
For historical reasons (these names are used when deparsing), attributes ".Dim", ".Dimnames", ".Names", ".Tsp" and ".Label" are renamed to "dim", "dimnames", "names", "tsp" and "levels".
RBasic-04-Factors
,練習操作R 的factor物件> x <- matrix(1:4, 2, 2)
> x
[,1] [,2]
[1,] 1 3
[2,] 2 4
> class(x)
[1] "matrix"
> attributes(x)
$dim
[1] 2 2
> attributes(x) <- NULL
> x # 同 1:4
[1] 1 2 3 4
> attr(x, "dim") <- c(2, 2, 1)
> x # 同 1:4
, , 1
[,1] [,2]
[1,] 1 3
[2,] 2 4
> class(x)
[1] "array"
RBasic-05-Arrays-Matrices
,練習操作R 的matrix和array物件> x <- 1:10
> class(x)
[1] "integer"
> x[1] <- "1"
> class(x)
[1] "character"
> x
[1] "1" "2" "3" "4" "5" "6" "7" "8" "9" "10"
1:5
x <- list(1:5, c("a", "b"))
[
x[1]
[[
x[[1]]
names
> x <- list(1:5, c("a", "b"))
> x
[[1]]
[1] 1 2 3 4 5
[[2]]
[1] "a" "b"
> attributes(x)
NULL
names
> x <- list(a = 1:5, b = c("a", "b"))
> x
$a
[1] 1 2 3 4 5
$b
[1] "a" "b"
> attributes(x)
$names
[1] "a" "b"
$
> x <- list(a = 1:5, b = c("a", "b"))
> x$a
[1] 1 2 3 4 5
> x$b
[1] "a" "b"
data.frame
data.frame
是R 為了解決結構化資料所提出的解決方案已經成為處理「結構化資料」的典範
The main driver for Distributed DataFrame is to have a cluster-based, big data representation that’s friendly to the RDBMSs and data science community. Specifically we leverage SQL’s table and R’s data.frame concepts, taking advantage of 30 years of SQL development and R’s accumulated data science wisdom.
> head(iris)
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
2 4.9 3.0 1.4 0.2 setosa
3 4.7 3.2 1.3 0.2 setosa
4 4.6 3.1 1.5 0.2 setosa
5 5.0 3.6 1.4 0.2 setosa
6 5.4 3.9 1.7 0.4 setosa
> class(iris)
[1] "data.frame"
> is.list(iris)
[1] TRUE
> head(iris[[1]])
[1] 5.1 4.9 4.7 4.6 5.0 5.4
> iris[1,]
Sepal.Length Sepal.Width Petal.Length Petal.Width Species
1 5.1 3.5 1.4 0.2 setosa
> iris[1,1]
[1] 5.1
RBasic-06-List-DataFrame
的課程,實際操作List和Data.Frameobject.size
object.size(logical(0))
40 bytes
object.size(rep(TRUE, 1000))
4040 bytes
object.size(rep(TRUE, 1e6))
4000040 bytes
object.size
object.size(integer(0))
40 bytes
object.size(seq(1L, by = 1L, length = 1e3))
4040 bytes
object.size(seq(1L, by = 1L, length = 1e6))
4000040 bytes
object.size
object.size(numeric(0))
40 bytes
object.size(seq(0, by = 1, length = 1000))
8040 bytes
object.size(seq(0, by = 1, length = 1e6))
8000040 bytes
speaker <- readLines("speaker.txt")
speaker[2]
[1] "年會總召, 中央研究院資訊科學研究所/ 研究員"
length(speaker)
[1] 216
file.size("speaker.txt")
[1] 18464
object.size(speaker)
24664 bytes
gc()
)gc
會進行以下動作:tracemem
> x <- c(1, 2, 3)
> tracemem(x)
[1] "<0x8de5838>"
> y <- x
> y[2] <- 3
tracemem[0x8de5838 -> 0x7f99070]