Hi, I am “Lynzy”
I am here to explain about the project
Our project
is a research-based project
focused on
retrieving and understanding log
from Apache Mesos
To further create
an AI-based logging system
anomaly detection
for CERN
Let’s start with CERN
Cern
is known as
"The European Organization of Nuclear Research"
which researches particle physics
to study the composition of matters
in the universe
CERN laboratory sits
on the French-Swiss border
near Geneva
here
there is
an accelerator complex ring at CERN
each accelerator machine
boosts a beam of particles
to high energies
before the beams are made
to collide with each other
with stationary targets
or injecting the beam into
the next machine in the sequence
Moreover
there are Detectors
used to observe
and record the results
of these collisions
ALICE(A Large Ion Collider Experiment)
is one of the detectors at Cern
It is designed to study the physics
of strongly interacting matter
at extreme energy densities
where a phase of matter
called
"quark-gluon plasma" forms
Cern, recently
launches a new Monitoring system
for ALICE detector
This Monitoring system
currently
does not have any anomaly detection
in the logging system
to support predictive maintenance
Our project
is part of the collaborative project
with CERN
ALICE Data Acquisition team
to create an AI-base logging system
anomaly detection
in order to
enhance predictive maintenance
and failure management
in CERN, Alice system
Predictive maintenance
can help CERN know
which server or service
is going to fail
and be able to fix it
before its failure
Without the anomaly detection
in the logging system of Alice
the failure of computing nodes
or computing services
can affect the loss of data
which is possible
to occur all the time
If the node or service fails
during the night time
the possibility that someone
will be able to resolve the problems immediately
is very low
due to less manpower
This research project
focuses on research
about retrieving logs
and understanding Apache Mesos logging systems
whereas the retrieved logs will be passed
to the data scientist team
to do further analysis
and create an AI-based anomaly detection
Our project
is the crucial starting part
of the environment setting
and log investigation
There are many services running in Cern
However
the top priority service is
Apache Mesos
as the detector
needs to manage
a large number of nodes (or cluster)
at the same time
while reading out data
So, we choose Apache Mesos
as an initial service
for experiments
before moving on to integrate
the anomaly detection system
to other services
We have been talking a lot about Apache Mesos
Please let me introduce to you
Apache Mesos
is an open-source project
to manage computer clusters
Mesos provides applications with API’s
for resource management and scheduling
across the entire data center
and cloud environments
Like
an airport dispatcher
who assigning time slots
to airplanes
to land or take off
with runways as computer nodes
and airplanes as computed tasks
While a cluster
consists of many computer nodes
Mesos makes it like
we manage
only a single pool of resources
with Mesos to decide
when and which resource
will be used
efficiently
The method used
is by first
researching and exploring
all the related documents
about Apache Mesos
and Alice control systems
Consequently
retrieved logs from Apache Mesos
Then investigate the event in the log
by comparing it
with basic system scenarios
From the experiment
we can conclude that
a Mesos’s log event
has a relationship with the basic scenarios
in the system
That is
when got assigned task
Mesos Master and Agent
will have High CPU and Memory usage
As a result
the data science team
can use the result from our experiment
and the log we retrieved
to help create
an AI-based logging system
to do predictive maintenance
Lastly
CERN
can apply
an AI-based logging system anomaly detection
in the Alice facilities
which our project
is part of the contribution
Thank you for listening !
Bye ~
