Title: | Descriptive Statistics Functions for Numeric Data |
Version: | 0.1.2 |
Description: | Provides fundamental functions for descriptive statistics, including MODE(), estimate_mode(), center_stats(), position_stats(), pct(), spread_stats(), kurt(), skew(), and shape_stats(), which assist in summarizing the center, spread, and shape of numeric data. For more details, see McCurdy (2025), "Introduction to Data Science with R" https://jonmccurdy.github.io/Introduction-to-Data-Science/. |
License: | MIT + file LICENSE |
Encoding: | UTF-8 |
RoxygenNote: | 7.3.2 |
Depends: | R (≥ 3.5) |
LazyData: | true |
Suggests: | roxygen2 |
NeedsCompilation: | no |
Packaged: | 2025-07-20 21:01:46 UTC; lukepapayoanou |
Author: | Luke Papayoanou [aut], Jon McCurdy [aut, cre] |
Maintainer: | Jon McCurdy <j.r.mccurdy@msmary.edu> |
Repository: | CRAN |
Date/Publication: | 2025-07-22 11:01:57 UTC |
MSMU: Fundamental Data Functions Package
Description
The MSMU package provides core functions for descriptive statistics and exploratory data analysis. It includes functions for computing central tendency, spread, shape, and position statistics, along with utility functions for estimating modes and standardized ranges. The package contains
Functions
Datasets
Author(s)
Luke Papayoanou, Jon McCurdy
Find the Mode of a Numeric Vector
Description
Calculates the mode (most frequent value) of a numeric vector. If there is a tie, returns all values that share the highest frequency.
Usage
MODE(x)
Arguments
x |
A numeric vector. |
Value
A numeric value (or vector) representing the mode(s) of x
.
Examples
# Mode of a Numeric Vector
MODE(c(1,2,3,3,3,4,5,5,3,8))
# Mode of the number of cylinders in mtcars dataset
data("mtcars")
MODE(mtcars$cyl)
Professional baseball teams data
Description
This dataset contains historical performance and statistics for professional baseball teams across multiple seasons from 2000-2020.
Usage
baseball_teams
Format
A data frame with 630 rows and 12 columns:
- year
Year (integer)
- team_name
Team (character)
- games_played
Number of games played (integer)
- wins
Number of wins (integer)
- losses
Number of losses (integer)
- world_series
World series winner that specific year (character)
- runs_scored
Number of total runs scored during season (integer)
- hits
Number of total hits during season (integer)
- homeruns
Number of total homeruns during season (integer)
- earned_run_average
Team earned run average per 9 innings (numeric)
- fielding_percentage
Team fielding percentage (numeric)
- home_attendance
Average home game attendance (integer)
Source
Data retrieved from Lahmans Baseball Database with alterations made for educational purposes
College basketball data
Description
This dataset contains performance statistics for 363 men’s college basketball teams from the 2022-23 season.
Usage
basketball
Format
A data frame with 363 rows and 18 columns:
- School
School (character)
- State
State (character)
- W
Wins (integer)
- L
Loss's (integer)
- W.L.
Win Loss percentage (numeric)
- SRS
Simple Rating System (numeric)
- SOS
Strength of Schedule (numeric)
- Points.Scored
Points scored (integer)
- Points.Allowed
Points allowed (integer)
- FG.
Team field goal percentage (numeric)
- X3P.
Three point percentage (numeric)
- FT.
Free throw percentage (numeric)
- Rebounds
Number of rebounds (integer)
- AST
Number of assists (integer)
- STL
Number of steals (integer)
- Blocks
Number of blocks (integer)
- Turn.Overs
Number of turn overs (integer)
- Fouls
Number of fouls (integer)
Source
Data retrieved from Sports Reference with alterations made for educational purposes.
Summary of Central Tendency
Description
Computes a variety of center statistics for a numeric vector, including:
mean, median, trimmed means (10% and 25%), and estimated mode (via probability density function
using estimate_mode()
).
Usage
center_stats(x)
Arguments
x |
A numeric vector. |
Value
A named numeric vector with values for:
- mean
Arithmetic mean
- median
Median
- trim25
25% trimmed mean
- trim10
10% trimmed mean
- est_mode
Estimated mode from
estimate_mode()
See Also
Examples
# Center Stats of continuous random data
set.seed(123)
x <- rnorm(1000, mean=50, sd=10)
center_stats(x)
# Center Stats of Sepal Length in iris data set
data("iris")
center_stats(iris$Sepal.Length)
Christmas data
Description
Santa's dataset, exploring if Santa gives children presents based a variety of variables!
Usage
christmas
Format
A data frame with 1000 rows and 15 columns:
- Gender
Gender (character)
- Toy_Count
Number of toys (integer)
- Chores_Completed
Number of Chores completed (numeric)
- Favorite_Color
Childs Favorite color (character)
- Helping_Hand
Childs helping hand number/score (integer)
- Complaints_Received
Number of complaints child says (numeric)
- Tantrum_Count
Number of Tantrums child has (integer)
- Rule_Breaks
Number of rule breaking child does (numeric)
- Sharing_Behavior
Childs willingness to share (numeric)
- Hours_of_Sleep
Childs average hours of sleep per night (numeric)
- Screen_Time
Childs average hours of screen time (numeric)
- School_Grade
Childs school grade (numeric)
- Parent_Presence
Childs parent presence (numeric)
- Greed_Score
Santas numeric system for labeling childrens greed (numeric)
- Outcome
Whether a child gets a present or coal (character)
Source
Santa
Class demographics
Description
A sample dataset representing demographic and academic information for 50 college students.
Usage
class_demographics
Format
A data frame with 50 rows and 6 columns:
- names
Persons name (character)
- ages
Persons age (int)
- state
Persons state (character)
- year
Persons year in college (character)
- majors
Persons major (character)
- sport
Binary Sport, 1(yes) or 0(no) (integer)
Source
Synthetic Data
College data
Description
This dataset provides detailed information on 777 U.S. colleges and universities from 1995, covering aspects of admissions, academics, finances, and student demographics.
Usage
college_data
Format
A data frame with 777 rows and 16 columns:
- Name
College name (character)
- Region
US region (character)
- Accept
Acceptance (integer)
- Enroll
Enrollment (integer)
- Top10perc
Percent of students that were top 10 in highschool class (integer)
- Top25perc
Percent of students that were top 25 in highschool class (integer)
- F.Undergrad
Full time undergrad (integer)
- P.Undergrad
Part time undergrad (integer)
- Outstate
Number of Out of state students (integer)
- Room.Board
Annual room and board price (integer)
- PhD
Percentage of Faculty with a PhD (integer)
- Terminal
Percentage of Faculty with a terminal degree (integer)
- S.F.Ratio
Student Faculty ratio (numeric)
- perc.alumni
Percent of alumni who donate to the college (integer)
- Expend
Instructional expenditure per student (integer)
- Grad.Rate
Graduation Rate (integer)
Source
This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University. Adapted from the College data set in the ISLR library with alterations made for educational purposes.
County data
Description
Data for 3142 counties in the United States containing demographic, educational, economic, and technological statistics.
Usage
county_data
Format
A data frame with 3142 rows and 17 columns:
- state
State (character)
- name
County name (character)
- fips
County level FIPS code (integer)
- pop
County population (integer)
- households
Number of households (integer)
- median_age
Median age of people in county (numeric)
- age_over_18
Percent age of people over 18 (numeric)
- age_over_65
Percent age of people over 65 (numeric)
- hs_grad
Percent of highschool grads (numeric)
- bachelors
Percent of people with bachelors degrees (numeric)
- white
Percent of population that is white (numeric)
- black
Percent of population that is black (numeric)
- hispanic
Percent of population that is hispanic (numeric)
- household_has_smartphone
Percent of households who have a smartphone (numeric)
- mean_household_income
Average household income (integer)
- median_household_income
Median household income (integer)
- unemployment_rate
Unemployment rate (numeric)
Source
Adapted from the county_complete data set in the usdata library with alterations made for educational purposes.
Course scores data
Description
This dataset contains academic performance records for 200 students across four years of high school, with scores or letter grades in English and Math.
Usage
course_scores
Format
A data frame with 200 rows and 10 columns:
- student
Student ID (integer)
- type
Grade type (character)
- Freshman_English
Freshman English Score/letter grade (character)
- Freshman_Math
Freshman Math Score/letter grade (character)
- Sophomore_English
Sophomore English Score/letter grade (character)
- Sophomore_Math
Sophomore Math Score/letter grade (character)
- Junior_English
Junior English Score/letter grade (character)
- Junior_Math
Junior Math Score/letter grade (character)
- Senior_English
Senior English Score/letter grade (character)
- Senior_Math
Senior Math Score/letter grade (character)
Source
Synthetic Data
Synthetic Census dataset
Description
A synthetic dataset containing demographic and socioeconomic information for 1,000 individuals.
Usage
data_210_census
Format
A data frame with 1000 rows and 5 columns:
- age
Persons Age (integer)
- gender
Persons Gender (character)
- degree
Persons level of education (character)
- salary
Persons Yearly Salary (integer)
- height
Persons Height in inches (integer)
Source
Synthetic Data
2020 election data
Description
Dataset providing detailed results from the 2020 U.S. presidential election at the county level.
Usage
election_2020
Format
A data frame with 32177 rows and 7 columns:
- state
State (character)
- state_ev
State electoral votes (integer)
- county
County name (character)
- candidate
Candidate name (character)
- party
Candidate party (character)
- total_votes
Total number of votes (integer)
- won
True or false for the candidate to win the county (logical)
Source
Data retrieved from MIT Election Data and Science Lab, 2018, "County Presidential Election Returns 2000-2020” with alterations made for educational purposes.
Estimate Mode using Density function to find Mode of continuous data
Description
Estimates the mode of a numeric vector by identifying the value corresponding to the peak of its estimated probability density function.
Usage
estimate_mode(x)
Arguments
x |
A numeric vector. Missing values ( |
Value
A single numeric value representing the estimated mode.
Examples
# Estimate the mode of continuous random data
set.seed(123)
x <- rnorm(1000, mean=5, sd=2)
estimate_mode(x)
# Estimate the mode of miles-per-gallon (mpg) in the mtcars dataset
data("mtcars")
estimate_mode(mtcars$mpg)
Exam data
Description
Synthetic dataset containing academic performance and background information for 1,000 students.
Usage
exam_data
Format
A data frame with 1000 rows and 8 columns:
- gender
Students gender (character)
- race.ethnicity
Students race/ethnicity (character)
- parental.level.of.education
Parents level of education (character)
- lunch
Students lunch plan (character)
- test.preparation.course
Student test prep level (character)
- math.score
Students math score (integer)
- reading.score
Students reading score (integer)
- writing.score
Students writing score (integer)
Source
Data retrieved from roycekimmons generated data
Football/Quarterback data
Description
Dataset containing performance statistics for 106 football players who attempted a pass in the NFL for the 2022 season.
Usage
football
Format
A data frame with 106 rows and 17 columns:
- Player
Players name (character)
- Tm
Players team (character)
- Age
Players Age (integer)
- Pos
Players position (character)
- G
Number of games (integer)
- GS
Number of games starting (integer)
- Wins
Number of wins (integer)
- Cmp
Number of completions (integer)
- Att
Number of throwing attempts (integer)
- Cmp.
Completion percentage (numeric)
- Yds
Number of yards thrown (integer)
- TD
Number of touchdowns (integer)
- Int
Number of interceptions thrown (integer)
- Y.A
Yards per Attempt (numeric)
- Y.G
Yards per Game (numeric)
- Rate
Passer rating (numeric)
- QBR
Total Quarterback Rating (numeric)
Source
Data retrieved from Pro Football Reference with alterations made for educational purposes.
Heart data
Description
Dataset containing medical and diagnostic information for 303 patients, used to study the presence of Atherosclerotic Heart Disease (AHD).
Usage
heart
Format
A data frame with 303 rows and 14 columns:
- Age
Patients age (integer)
- Sex
Patients Sex (1 = Male, 0 = Female) (integer)
- ChestPain
Chest pain type (character)
- RestBP
Resting blood pressure (in mm Hg on admission to the hospital) (integer)
- Chol
Serum cholesterol in mg/dl (integer)
- Fbs
fasting blood sugar > 120 mg/dl (1 = true; 0 = false) (integer)
- RestECG
Resting electrocardiographic results (integer)
- MaxHR
Maximum heart rate achieved (integer)
- ExAng
Exercise induced angina (1 = yes; 0 = no) (integer)
- Oldpeak
ST depression induced by exercise relative to rest (numeric)
- Slope
The slope of the peak exercise ST segment (integer)
- Ca
Number of major vessels (0-3) colored by fluoroscopy (integer)
- Thal
Thal condition (character)
- AHD
Atherosclerosis Heart Disease condition (character)
Source
Data retrieved from UC Irvine Machine Learning Repository
Housing data
Description
Data on houses that were recently sold in the Duke Forest neighborhood of Durham, NC in November 2020.
Usage
housing_data
Format
A data frame with 98 rows and 6 columns:
- price
Home price (numeric)
- bed
Number of bedrooms (integer)
- bath
Number of bathrooms (numeric)
- area
Square footage (integer)
- year_built
Date house was built (integer)
- lot
lot size (numeric)
Source
Adapted from the duke_forest dataset in the openintro library with alterations made for educational purposes.
Income data
Description
Dataset containing basic demographic and financial information for 20 individuals.
Usage
income_data
Format
A data frame with 20 rows and 5 columns:
- ID
ID (integer)
- Ages
age (integer)
- Years_til_Retirement.65
Years until retirement at 65 (integer)
- Salary
Salary (integer)
- Birth_weight
Birth weight (integer)
Source
Synthetic Data
Compute Sample Kurtosis
Description
Calculates the kurtosis of a numeric vector. A value near 0 suggests normal kurtosis (mesokurtic), positive values indicate heavier tails (leptokurtic), and negative values indicate lighter tails (platykurtic).
Usage
kurt(x)
Arguments
x |
A numeric vector. |
Details
The z-scores are computed as:
z_i = \frac{x_i - \bar{x}}{sd}
The kurtosis is then calculated as:
\text{Kurtosis} = \frac{1}{n} \sum_{i=1}^{n} z_i^4 - 3
Where:
-
\bar{x}
is the mean ofx
, -
sd
is the standard deviation ofx
, and
n
is the number of observations.
Value
A single numeric value representing the kurtosis
Examples
# Kurtosis of mpg in mtcars
data("mtcars")
kurt(mtcars$mpg)
Ledger data
Description
Dataset mimicking a ledger showing the price an item was bought and sold for, the date it occurred, and the color of the product.
Usage
ledger_data
Format
A data frame with 4 rows and 104 columns:
- color
colors (character)
- type
age (integer)
- Jan_08
Price on date (numeric)
- Jan_15
Price on date (numeric)
- Jan_16
Price on date (numeric)
- Jan_31
Price on date (numeric)
- Feb_02
Price on date (numeric)
- Feb_03
Price on date (numeric)
- Feb_04
Price on date (numeric)
- Feb_14
Price on date (numeric)
- Feb_20
Price on date (numeric)
- Feb_22
Price on date (numeric)
- Feb_25
Price on date (numeric)
- Feb_27
Price on date (numeric)
- Feb_28
Price on date (numeric)
- Mar_01
Price on date (numeric)
- Mar_05
Price on date (numeric)
- Mar_09
Price on date (numeric)
- Mar_12
Price on date (numeric)
- Mar_16
Price on date (numeric)
- Mar_20
Price on date (numeric)
- Mar_21
Price on date (numeric)
- Mar_22
Price on date (numeric)
- Mar_24
Price on date (numeric)
- Mar_27
Price on date (numeric)
- Mar_28
Price on date (numeric)
- Mar_31
Price on date (numeric)
- Apr_06
Price on date (numeric)
- Apr_08
Price on date (numeric)
- Apr_10
Price on date (numeric)
- Apr_18
Price on date (numeric)
- Apr_19
Price on date (numeric)
- Apr_24
Price on date (numeric)
- Apr_26
Price on date (numeric)
- Apr_29
Price on date (numeric)
- May_01
Price on date (numeric)
- May_04
Price on date (numeric)
- May_12
Price on date (numeric)
- May_17
Price on date (numeric)
- May_24
Price on date (numeric)
- May_25
Price on date (numeric)
- May_28
Price on date (numeric)
- Jun_01
Price on date (numeric)
- Jun_04
Price on date (numeric)
- Jun_11
Price on date (numeric)
- Jun_16
Price on date (numeric)
- Jun_25
Price on date (numeric)
- Jun_28
Price on date (numeric)
- Jul_03
Price on date (numeric)
- Jul_04
Price on date (numeric)
- Jul_08
Price on date (numeric)
- Jul_10
Price on date (numeric)
- Jul_11
Price on date (numeric)
- Jul_13
Price on date (numeric)
- Jul_18
Price on date (numeric)
- Jul_23
Price on date (numeric)
- Jul_25
Price on date (numeric)
- Aug_05
Price on date (numeric)
- Aug_12
Price on date (numeric)
- Aug_13
Price on date (numeric)
- Aug_24
Price on date (numeric)
- Aug_26
Price on date (numeric)
- Sep_02
Price on date (numeric)
- Sep_06
Price on date (numeric)
- Sep_07
Price on date (numeric)
- Sep_08
Price on date (numeric)
- Sep_16
Price on date (numeric)
- Sep_21
Price on date (numeric)
- Sep_22
Price on date (numeric)
- Sep_23
Price on date (numeric)
- Sep_27
Price on date (numeric)
- Oct_07
Price on date (numeric)
- Oct_09
Price on date (numeric)
- Oct_10
Price on date (numeric)
- Oct_15
Price on date (numeric)
- Oct_16
Price on date (numeric)
- Oct_17
Price on date (numeric)
- Oct_19
Price on date (numeric)
- Oct_20
Price on date (numeric)
- Oct_21
Price on date (numeric)
- Oct_22
Price on date (numeric)
- Oct_29
Price on date (numeric)
- Oct_30
Price on date (numeric)
- Oct_31
Price on date (numeric)
- Nov_03
Price on date (numeric)
- Nov_04
Price on date (numeric)
- Nov_12
Price on date (numeric)
- Nov_13
Price on date (numeric)
- Nov_14
Price on date (numeric)
- Nov_16
Price on date (numeric)
- Nov_18
Price on date (numeric)
- Nov_23
Price on date (numeric)
- Nov_24
Price on date (numeric)
- Dec_02
Price on date (numeric)
- Dec_03
Price on date (numeric)
- Dec_06
Price on date (numeric)
- Dec_11
Price on date (numeric)
- Dec_12
Price on date (numeric)
- Dec_13
Price on date (numeric)
- Dec_16
Price on date (numeric)
- Dec_17
Price on date (numeric)
- Dec_18
Price on date (numeric)
- Dec_19
Price on date (numeric)
- Dec_26
Price on date (numeric)
Source
Synthetic Data
MLB data
Description
Batter statistics for 2018 Major League Baseball season
Usage
mlb_eda
Format
A data frame with 1270 rows and 13 columns:
- name
Players name (character)
- team
Players team (character)
- position
Players position (character)
- games
Number of games (integer)
- AB
Number of at bats (integer)
- R
Number of runs (integer)
- H
Number of hits (integer)
- doubles
Number of doubles (integer)
- HR
Number of Home runs (integer)
- RBI
Number of Runs Batted In (integer)
- AVG
Players batting average (numeric)
- SLG
Players Slugging percentage (numeric)
- OPS
Players On-base Plus Slugging (numeric)
Source
Data retrieved from MLB, with alterations made for educational purposes.
Mount St.Mary's dorm data
Description
Dataset summarizing the distribution of male and female students across various dormitories at Mount College, categorized by academic year.
Usage
mount_dorms
Format
A data frame with 4 rows and 11 columns:
- year
Students year (character)
- m_Pangborn
Males living in Pangborn (integer)
- m_Sheridan
Males living in Sheridan (integer)
- m_Terrace
Males living in Terrace (integer)
- m_Powell
Males living in Powell (integer)
- m_Towers
Males living in the Towers (integer)
- f_Pangborn
Females living in Pangborn (integer)
- f_Sheridan
Females living in Sheridan (integer)
- f_Terrace
Females living in Terrace (integer)
- f_Powell
Females living in Powell (integer)
- f_Towers
Females living in the Towers (integer)
Source
Synthetic Data
Percent Within N Standard Deviations of the Mean
Description
Calculates the percentage of values in a numeric vector that fall within
n
standard deviations of the mean.
Usage
pct(x, n)
Arguments
x |
A numeric vector. |
n |
A positive numeric value indicating how many standard deviations from the mean to use as bounds. |
Value
A single numeric value representing the percentage (0–100) of values within the specified range.
Examples
# Percentage of values that fall within 2 sds of the mean in random normal data
set.seed(123)
x <- rnorm(1000)
pct(x,2)
# Percentage of values that fall within 2 sds of the mean in iris Sepal Lengths
data("iris")
pct(iris$Sepal.Length, 2)
Computes Position Statics, Quintiles and Quartiles
Description
Calculates the quintiles, including quartiles(data is split in 4 equal parts) and quintiles(data is split in 5 equal parts) of a numeric vector using the 'quantile()' function. NA's are removed.
Usage
position_stats(x)
Arguments
x |
A numeric vector. |
Details
Percentiles are values that divide a dataset into 100 equal parts, each representing 1% of the distribution. For example, the 25th percentile is the value below which 25% of the data fall.
Quartiles are special percentiles that divide the data into four equal groups: Q1 (25th percentile), Q2 (50th percentile or median), Q3 (75th percentile).
Quintiles divide data into five equal groups, each representing 20% of the distribution: 20th percentile, 40th, 60th, 80th percentiles split the data into quintiles.
Value
A list with two elements:
- quint
Numeric vector of quintiles (0%, 20%, 40%, ..., 100%)
- quart
Numeric vector of quartiles (0%, 25%, 50%, 75%, 100%)
Examples
# Position stats of random data
set.seed(123)
x <- rnorm(1000)
position_stats(x)
# Position stats of MPG in mtcars data set
data("mtcars")
position_stats(mtcars$mpg)
Reaction Data
Description
This dataset contains synthetic reaction time measurements for 100 individuals under different conditions.
Usage
reaction_time
Format
A data frame with 100 rows and 6 columns:
- person
Person id (integer)
- color
color (character)
- left
left (numeric)
- right
right (numeric)
- age
Person age (numeric)
- gender
Person gender (character)
Source
Synthetic Data
Computes Sample Skew and Kurtosis
Description
Calculates the skewness of a numeric vector (via skew()
).
A positive value indicates right skew (long right tail), while a negative value
indicates left skew (long left tail). A zero value represents symmetry.
Calculates the kurtosis of a numeric vector (via kurt()
).
A value near 0 suggests normal kurtosis (mesokurtic),
positive values indicate heavier tails (leptokurtic), and negative
values indicate lighter tails (platykurtic).
Usage
shape_stats(x)
Arguments
x |
A numeric vector. |
Value
A list with two elements:
- skew
Skew of Data from
skew()
- kurt
Kurtosis of Data from
kurt()
Examples
# Shape stats of mpg in mtcars
data("mtcars")
shape_stats(mtcars$mpg)
Compute Sample Skewness
Description
Calculates the skewness of a numeric vector. A positive value indicates right skew (long right tail), while a negative value indicates left skew (long left tail). A zero value represents symmetry
Usage
skew(x)
Arguments
x |
A numeric vector. |
Value
A single numeric value representing the skewness of the distribution.
Examples
# Skew of Sepal Lengths in iris
data("iris")
skew(iris$Sepal.Length)
Historic soccer data
Description
This dataset contains historical match results from various international soccer games between different countries for the years 1872-2024.
Usage
soccer
Format
A data frame with 13750 rows and 5 columns:
- date
Date of match (character)
- home_team
Home team name (character)
- away_team
Away team name (character)
- home_score
Home teams goal count (integer)
- away_score
Away teams goal count (integer)
Source
Data retrieved from Kaggle International football results dataset with alterations made for educational purposes.
Summary of Spread Statistics
Description
Computes a variety of spread statistics for a numeric vector, including:
standard deviation, iqr, the normalized minimum, maximum,
and range as well as the percentage of data within 1, 2,
and 3 standard deviations (via pct()
)
Usage
spread_stats(x)
Arguments
x |
A numeric vector |
Value
- sd
Standard Deviation
- iqr
Inter Quartile Range
- minz
Normalized Minimum
- maxz
Normalized Maximum
- diffz
Normalized Range
- pct1
Percent of data within 1 standard deviation from
pct()
- pct2
Percent of data within 2 standard deviation from
pct()
- pct3
Percent of data within 3 standard deviation from
pct()
See Also
Examples
# Spread stats of random normal data
set.seed(123)
x <- rnorm(1000)
spread_stats(x)
# Spread stats of mpg in mtcars
data("mtcars")
spread_stats(mtcars$mpg)